l 
i 


BIOMETRIKA 


BIOMETRIKA 


FOUNDED BY 


W. Е. R. WELDON, FRANCIS GALTON anno KARL PEARSON 


MANAGING EDITOR 


E. S. PEARSON 


ASSOCIATE EDITORS 
M. G. KENDALL JOHN WISHART 


in consultation with 


HARALD CRAMER J. В. 5. HALDANE 
F. N. DAVID H. O. HARTLEY 
R. C. GEARY D. G. KENDALL 


Bureau of Edni ё Psyl. Kes sarc 


VOLUME 42 


ISSUED BY 


THE BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON 
PRINTED AT THE UNIVERSITY PRESS, CAMBRIDGE 


PRINTED IN GREAT BRITAIN 


CONTENTS OF VOLUME 42 


Memoirs and Miscellanea 


ANDREWS, Е. C. and Cuzrnorr, Н. А large-sample bioassay design with random 
doses and uncertain concentration . х 


Anis, А. А. The variance of the maximum of partial sums of a finite number of 
independent normal variates . 


ARMSEN, P. Tables for significance tests of 2 x 2 contingency tables 
BanmrLETT, M. S. Approximate confidence intervals. III. A bias correction . 


BanrrETT, M. S. and Мернт, J. Ontheefficiency of procedures for реи periodo- 
grams from time series with continuous spectra. ; : : к 


BRADLEY, К. A. Rank analysis of incomplete block designs. ПІ. Some large-sample 
results on estimation and power for a method of paired comparisons 


Broapsent, S. В. Quantum hypotheses 


CHAPMAN, D. G. Population estimation based on one of composition caused by 
a selective removal $ 5 ; Ў : У 2 


CugRNorr, Н. (See ANDREWS, Е. C.) 
Cuv, J. T. On bounds for the normal integral 


CHU, J. T. The bp rap of the sample median for many familiar symmetric 
distributions . : A К - A А 


COHEN, А. C. Censored samples from truncated normal distributions 


Cox, D. К. and Sruarr, A. Some quick sign tests for trend in location and 
dispersion j : : A 


Davin, F. N. Studies in the history of probability and statistics. I. on and 
gaming (a note on the history of probability) + : : 


Davin, Е. N. and KrxpArr, М. G. Tables of symmetric functions. Part V. 
Davin, Н. A. A note on moving ranges 


DUNNETT, C. W. and SOBEL, M. Approximations to the probability integral and 
certain percentage points of a multivariate analogue of Student’s t-distribution 


Ерітовтат. The normal probability function: tables of certain area-ordinate ratios 
and of their reciprocals . 


FosrzRn, Е. 6. А note on Bailey's and Whittle's treatment of a general stochastic 
epidemic + ^ . : , ; ; 7 : 


Gant, J. Some problems in the theory of provisioning and of dams 


123 
179 


vi Contents 

Gant, J. Some theorems and sufficiency conditions for the maximum -likelihood esti- 
mator of an unknown parameter in a simple Markov chain 

Guosu, M. N. Simultaneous tests of linear hypotheses 

Соор, I. J. The likelihood ratio test for Markoff chains 

GurzAND, J. A. On the estimation of population parameters from marked members 

HaBERMAN, S. Distributions of Kendall's tau based on partially ordered systems 

HALDANE, J. B. S. Substitutes for y? 

HarpANE, J. B. S. A problem in the significance of small numbers 


HarpaNE, J. B. S. The rapid calculation of X? as a test of homogeneity from a 
2x n table 4 


Hannan, Е. J. Exact tests for serial correlation . 

Hannan, E. J. An exact test for correlation between time series . 
Herne, V. Models for two-dimensional stationary stochastie processes 
Hopazs, J. L. Galton's rank-order test 


HurrsoN, A. A method of assigning confidence limits to linear combinations of 
variances : E Р : К 


HvuzuRBAZAR, V. S. Exact forms of some invariants for distributions admitting 
sufficient statistics . 


James, G. S. Cumulants of a transformed variate 


Jowrrr, б. Н. Sampling properties of local statistics in stationary stochastic 
series : j i $ 
KENDALL, М. G. (See Davin, Е. № jj 


Leste, P. Н. A simple method of calculating the exact probability in 2 x 2 contin- 
gency tables with small marginal totals . : i 


MAUTNER, A. J. (See RUSHTON, 8.) 

Мкрнт, J. (See BARTLETT, M. 8.) 

Монах, С. The gambler’s ruin problem with correlation 

Pacer, E. S. Control charts with warning lines 

Pace, E. S. А test for a change in a parameter occurring at an unknown point . 
Powzrr, Е. О. Some features of the generation times of individual bacteria 


` 


Ковнтох, 8. and МАптхЕв, A. J. The deterministic model of a simple epidemic for 
more than one community 


SAMPFORD, М. В. The truncated negative binomial distribution . 


PAGE 


342 


160 


522 


pe 


Contents vii 


5імох, Н. A. On a class of skew distribution functions : : х , м ze 
STUART, A. A test for homogeneity of the marginal distributions in a two-way 

classification . j ? A : ; Я Е А Р : . 412 
STUART, A. A paradox in statistical estimation . : $ ^ i : oe OST 
STUART, A. (See Cox, D. В.) 
ТАТЕ, В. Е. The theory of correlation between two continuous variables when one 

is dichotomized . ; : А ^ - i : , ^ : . 205 
Тномрѕох, Н. R. Spatial point processes, with applications to ecology 3 2, 102 
Тномѕох, С. W. Bounds for the ratio of range to standard deviation . = . 268 
Tuxzv, J. W. Interpolations and approximations related to the normal range . 480 
Watson, G. S. Serial correlation in regression analysis. I. . TP n : . 327 
Waves, W. A. ON. An age-dependent birth and death process. 3 к ` = | 291 
WnrrrLE, P. The outcome of a stochastic epidemic—a note on Bailey’s paper . 116 


Wax, M. B. The randomization analysis of a generalized randomized block design 70 


WILLIAMS, E. J. Significance tests for discriminant functions and linear functional 
relationships . ; 5 ‹ s T k i ч : : Р . 360 


Wiss, J. The autocorrelation function and the spectral density function — . RE 


YATES, F. А note on the application of the combination of probabilities test to a set 
of 2 x 2 tables З А А 5 а : = : А : à . 404 


Yarus, Е. The use of transformations and maximum likelihood in the analysis of 
quantal experiments involving two treatments — . E : Е 5 . 382 


Book Reviews 


BARTLETT, M. S. An Introduction to Stochastic Processes with special reference to 


Methods and Applications : n 4 5 " Е. N. Davo 539 
BENNETT, C. A. and FRANKLIN, №. L. Statistical Analysis in Chemistry and the 
` Chemical Industry . г : j : : З 4 G. E.P. Box 542 
BicknEv, W. С. Bessel Functions and Formulae . ; ч D. E. BARTON 275 


Briss, C. I. and CALHOUN, D. W. Outline of Biometry and Statistics 
А è М 3 е Р С. А. B. SurrH 541 


Davies, О. L. Design and Analysis of Industrial Experiments L. MOMULLEN 272 


Hansen, М. H., Hurwitz, W. N. and Mapow, W. С. Sample Survey Methods and 
Theory . : ? : : Р ; é à 3 . F.N. DAD 272 


viii Contents 
Horr, Р. G. Introduction to Mathematical Statistics — . : 26: MOORE 


Hooker, Р. Е. and LoxeuEy-Cook, L. Н. Life and other Contingencies, Vol. 1 
5 А : Р а : 5 2 к N. L. JOHNSON 


KEMPTHORNE, O., BANCROFT, T. A., Gowen, J. W. and Luss, J. L. Statistics and 
Mathematics in Biology . : А > : г Ы C. A. B. SMITE 


Тойук, М. Probability Theory . s.s. . . . . ЕМ. Davi 


NATIONAL BUREAU OF STANDARDS. Publications of the U.S. Departmentof Commerce, 
Applied Mathematics Series 9, 30, 35, 36, 40 
" D. E. Barron and E. S. PEARSON 


THE Page Corporation. A Million Random Digits with 100,000 Normal Deviates 


Р. б. MOORE 
Roya Socrery. Table of Binomial Coefficients .) . a ~ P.G.MoonE 
Srursxr, E. E. (Ed. Korwoconov, A. N.) Tables for the calculation of the incomplete 
[function and the X -probability function : : Я N. L. JOHNSON 
Surmm,C. А.В. Biomathemati GA . . . . . M. G. KENDALL 
SUKHATME, PANDURANG V. Меда) ae of Surveys with Applications 
? Ў à $ - J. DURBIN 
WALKER, HELEN М. and Lav, J. Statistical Inference . : . F.G. FOSTER 
Worn, Н. (with Jurgen, Lars) Demand Analysis — . . . F. G. FOSTER 


Worn, H. (with Warrree, P.) A EUM in the PENEN of Stationary Time-Series. 
2nd edition . * d С : ; С. M. JENKINS 


WorLrENDEN, Н. Н. Population Statistics and their Compilation . N. L. JOHNSON 


BIOMETRIKA PUBLICATIONS: BOOKS OF TABLES 


Issued by the Cambridge University Press, Bentley House, London, N.W. 1 
and obtainable from any bookseller p" 


Tables of the Incomplete B-Function 
EDITED BY KARL PEARSON 
59 pages of Introduction and 494 pages of Tables 


Price: 555. net. 
Tables of the Incomplete f-Function 
EDITED BY KARL PEARSON . 
31 pages of Introduction and 164 pages of Tables ` 


4 


Price: 425. net : 


Tables of the Complete and Incomplete Elliptic Integrals 
(from LEGENDRE’S Traité des Fonctions Elliptiques. With autographed.portrait of LEGENDRE) 
39 pages of Introduction by KARL PEARSON and 94 pages of Tables 
Price: 12s, 6d. nèt, “ 
' Ч" | 
Tables of the Ordinates and Probability Integral of the 
Distribution of the Correlation Coefficient; in Small Samples ` 
By Е. №. DAVID Je 
38 pages of Introduction, 55 pages of Tables, ro 


- 


Price: 175. 6d. net 


NEW PUBLICATION NOW йаз 
Biometrika Tables for Stafigtic 


The two volumes of Tables for Statisticians and Biometrici. 
issued. 


some new tables added. L 
Volume I of the new series, which includes the statistical and auxiliary mathematical tables in more 
common use is now available. It contains an Introduction and 54 tables covering in all 238 pp. 


Price: 25s. net 


BIOMETRIKA, 42, 1 and 2 @ 


NEW STATISTICAL TABLES: SEPARATES RE-ISSUED 
FROM BIOMETRIKA 


To be obtained from 
BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON, W.C.1 


1. From Biometrika, Vols. 22 and 27 
Tests of Normality. By E. S. PEARSON and R. C. GEARY 
Price One Shilling, post free 


Il. From Biometrika, Vol. 32, Part 2, pp. 168-181 and 188-189 
(1) Table of percentage points of the incomplete beta-function 
(2) Table of percentage points of the у? distribution 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


Ill. From Biometrika, Vol, 32, Parts 3 and 4, pp. 300-310 
(1) Table of the probability integral of the range in samples from a normal population 
(2) Table of the percentage points of the range 
(3) Table of the percentage points of the t-distribution 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


IV. From Biometrika, Vol. 33, Part 1, pp. 73-88 
Table of percentage points of the inverted beta (F) distribution 


With introductory matter. Price Two Shillings and Sixpence, post free 


V. From Biometrika, Vol. 33, Part 3, pp. 252-265 


1) Table of the probability integral of the mean deviation in samples from a normal 
E 
population 


(2) Table of the percentage points of the mean deviation 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 
МІ. From Biometrika, Vol. 33, Part 4, pp. 296-304 
Table for testing the homogeneity of a set of estimated variances 
With introductory matter. Price Two Shillings, post free 
VII. From Biometrika, Vol. 35, Parts 1 and 2, pp. 145-156 


Table of significance levels for the Fisher-Yates test of significance in 2x2 contingency 
tables. By D. J. FINNEY 


With introductory matter, Price Two Shillings and Sixpence, post free 


VIII. From Biometrika, Vol. 35, Parts 1 and 2, pp. 191-201 


Table for the calculation of working probits and weights in probit analysis. By D. J. FINNEY 
and W. L. STEVENS 


With introductory matter. Price Two Shillings and Sixpence, post free 
IX. From Biometrika, Vol. 36, Parts 3 and 4, pp. 267-289 
Tables of autoregressive series, By M. G. KENDALL 
With introductory matter. Price Two Shillings and Sixpence, post free 
X. From Biometrika, Vol. 36, Parts 3 and 4, pp. 431-449 
Tables of symmetric functions, Part |. By Р. М. DAVID and М. С. KENDALL 
With introductory matter. Price Two Shillings and Sixpence, post free 


(ii) 


NEW STATISTICAL TABLES: continued 


XI. From Biometrika, Vol. 35, Parts 1 and 2, pp. 118-144 
The distribution of the extreme deviate from the sample mean and its “studentized” form. 


By КАА. NAIR Price Two Shillings and Sixpence, post free 


XII. From Biometrika, Vol. 37, pp. 168-172 and pp. 313-325 
(1) Table of the probability integral of the t-distribution 
(2) Table of the x? integral, and of the cumulative Poisson distribution. By H. O. HARTLEY 
and E. S. PEARSON 
Stitched together with introductory matter. Price Five Shillings, post free 
XIll. From Biometrika, Vol. 38, Parts 1 and 2, pp. 112-130 
Charts of the power function for analysis of variance tests, derived from the non-central 
F-distribution. By E. S. PEARSON and Н. О. HARTLEY 
With introductory matter. Price Two Shillings and Sixpence, post free 
XIV. From Biometrika, Vol. 38, Parts 3 and 4, pp. 435-462 
Tables of symmetric functions. Parts 11 and III. By F. N. DAVID and M. С. KENDALL 
With introductory matter. Price Four Shillings, post free 
XV. From Biometrika, Vol. 38, Parts 3 and 4, pp. 423-426 
A chart for the incomplete beta-function and the cumulative binomial distribution. By H. O. 
HARTLEY and E. R. FITCH 
With introductory matter and ruler scale. Price Two Shillings and Sixpence, post free 
XVI. From Biometrika, Vol. 40, Parts 1 and 2, pp. 70-73 
Tables of the angular transformation. By W. L. STEVENS 
With introductory matter. Price One Shilling, post free 
XVII. From Biometrika, Vol. 40, Parts 1 and 2, pp. 74-86 
Tests of significance in a 2x 2 contingency table: extension of Finney's table (No. VII). 
Computed by R. LATSCHA 
With introductory matter. Price Two Shillings and Sixpence, post free 
XVIII. From Biometrika, Vol. 40, Parts 3 and 4, pp. 427—446 
Tables of symmetric functions. Part IV. By F. N. DAVID and M. G. KENDALL 
With introductory matter. Price Four Shillings, post free 
XIX. From Biometrika, Vol. 41, Parts 1 and 2, pp. 253-260 
Tables of generalized k-statistics. By $. Н. ABDEL-ATY 
With introductory matter. Price Two Shillings, post free 


No. XI is out of print. 


ELEMENTARY STATISTICAL EXERCISES, PART 1 


These exercises are based on the numerical class work which accompanies the first 
· year lectures to undergraduate students taking their B.Sc. degree in Statistics of 
the University of London. They cover the more elementary univariate theory, 
regression and correlation, 3? and significance tests, sampling distributions and 


quality control. 


The book contains 91 pp. and is reproduced from typescript by a photographic 


process. 
Price: 6s. 6d. 


ISSUED BY THE DEPARTMENT OF STATISTICS, UNIVERSITY COLLEGE, 
GOWER STREET, LONDON, w.c.1 


(iii) 


BIOMETRIKA PUBLICATIONS 


Issued by the Cambridge University Press, Bentley House, London, N.W.1 
and obtainable from any bookseller 


The Life, Letters and Labours of Francis Galton, Vols. I, II, ША, & IIIB 
By KARL PEARSON, F.R.S. Price £3. 3s. net 


Karl Pearson: An Appreciation of Some Aspects of his Life and Work 
By E. S. PEARSON Price 10s. 6d. net 


A Bibliography of the Statistical and Other Writings of Karl Pearson 
Compiled by G. M. MORANT, with the assistance of B. L. WELCH Price 6s. net 


**Student’s” Collected Papers Edited by E. S. PEARSON and JOHN WISHART 
with a Foreword by LAUNCE MCMULLEN Price 15s. net 


Karl Pearson’s Early Statistical Papers 

Reprinted by photo-lithography for the Biometrika Trust, with the permission of the original publishers. The 

Volume contains eleven papers, including the more important of the memoirs entitled “ Mathematical Contri- 

butions to the Theory of Evolution”, first published in the Philosophical Transactions of the Royal Society. The 

original paper deriving the x*-distribution, published in 1900 in the Philosophical Magazine, is also included. 
Price 21s. net 


ROYAL STATISTICAL SOCIETY 


THE JOURNAL OF THE ROYAL STATISTICAL SOCIETY is published in two series: SERIES A 
(GENERAL), four issues a year, 15s. each part, annual subscription £3. 1s. post free; SERIES B 


ON, two issues a year, 22s. 6d. each part, annual subscription 45s. 6d. post 
Tee. 


Series A (GENERAL), VOL. 117, РАкт 4, 1954 
Callbacks and Clustering in Sample Surveys: An Experimental Study (with Discussion). By J. DURBIN and A. 
STUART—The Phenomenon of Labour Turnover. By Н. SILCOCK—The Relations between Chained Indexes of 
Input, Gross Output and Net Output. Ву P. Н. KARMEL— The Mode of a Grouped Distribution. By Н. M. 
FINUCAN—First International Conference of National Committees on Vital and Health Statistics —Annual Report 
of the Council— Proceedings of the One Hundred and Twentieth Annual General Meeting—Obituary: Sir HENRY 


um А. M. SouTHALL—Reviews of Books, Statistical and Current Notes, Additions to Library, Periodical 
eturns. 


ROYAL STATISTICAL Society, 21 BENTINCK STREET, LONDON, W.1 


Gv) 


BIOMETRICS 


Journal of the Biometric Society 


Vor. 11, No. 1 TABLE OF CONTENTS MARCH, 1955 


Multiple Range and Multiple Е. Tests. Davin B. DuNcAN—Further Contributions to the Theory of Paired Com- 
parisons. M. С. KrNpALL— Comparative Sensitivity of Pair and Triad Flavor Intensity Difference Tests. J. W. 
HOPKINS and N. T. GRIDGEMAN—The Description of Genic Interactions in Continuous Variation. B. I. HAYMAN 
and К. MATHER—Quantitative Studies in Diphtheria Prophylaxis: An Attempt to Derive a Mathematical Charac- 
terization of the Antigenicity of Diphtheria Prophylactic. L. B. Horr—Prediction Equations in Quantitative 
Genetics. ALAN RoBERTSON— Determining the Fruit Count оп a Tree by Randomized Branch Sampling. RAYMOND 
J. JESSEN—A Further Note on Missing Data. H. W. NORTON. 


Vor. 11, No. 2 JUNE, 1955 


Rule of Thumb for Determining Expectations of Mean Squares in Analysis of Variance. E. F. SCHULTZ, Jr.— 
Variance Components with Reference to Genetic Population Parameters. Оокотнү C. Lowry—Use of the 
Simplex Design in the Study of Joint Action of Related Hormones. Р. J. CLaRINGBOLD—An Analysis of Perennial 
Crop Data. К. С. D. Steet—Statistical Analysis of Multiple Slope Ratio Assays. С. С. BARRACLOUGH—A First 
Course in Biometry for Graduate Students in Biology. С. 1. Butss—The Theory of Bacterial Constant Growth 
Apparatus. С. C. Spicer—An Inverted Matrix Approach for Determining Crop-Weather Regression Equations. 
HAROLD Е. Huppieston—Fitting the Neyman Type A (Two Parameter) Contagious Distribution. J. B. DOUGLAS. 


———-— 


Annual subscription rates to non-members are as follows: For American Statistical Association Members, 
$4.00; for subscribers, non-members of either American Statistical Association or the Biometric Society, 
$7.00. Subscriptions should be sent to the 
MANAGING EDITOR, BIOMETRICS 
Р.О. BOX 5457, RALEIGH, NORTH CAROLINA, U.S.A. 


TRABAJOS DE ESTADISTICA 


REVIEW PUBLISHED BY INSTITUTO DE INVESTIGACIONES ESTADÍSTICAS 
OF THE CONSEJO SUPERIOR DE INVESTIGACIONES CIENTÍFICAS 
MADRID, SPAIN 


CONTENTS 

Vol. V, Cuad. П 

P. ZoroA—Superposicién de variables aleatorias y sus aplicaciones. 

S. VAJDA—A Problem of Encounters. 

J. NEYMAN—Sur une famille de tests asymtotiques des hypotheses statistiques composées. 

NOTAS. 

CHARLES А. SPoERL—La ciencia actuarial: Una visión general de su desarrollo teórico. 

J. Royo v S. FERRER— Tabla de números aleatorios obtenida de los nümeros de la Lotería 
Nacional Espafiola. 

E. LLAGUNEZ Y J. L. TENDERO— Notas para la ensefianza de algunos conceptos elementales de 
Estadística en el Bachillerato. 

CRONICAS. BIBLIOGRAFÍA. CUESTIONES. 


For everything in connection with works, exchanges and subscription write to Professor Sixto 
Ríos, Instituto de Investigaciones Estadísticas del Consejo Superior de Investigaciones Científicas 
(Serrano, 123) Madrid, Spain. The Review is composed of three fascicles published three times 
a year (about 350 pages), and its annual prices are 80 pesetas for Spain and South America and 
$3.00 for all other countries. 


Annals of Human Genetics 


Formerly ANNALS OF EUGENICS 
Edited by L. S. PENROSE 


Vol. 19, Pt. 3 CONTENTS January, 1955 


A. C. SrEvENsoN— Muscular dystrophy in Northern Ireland. IL An account of nine additional families. 
III. Linkage data with particular reference to autosomal limb girdle muscular dystrophy. N. A. BARNICOT, 
M. S. C. BIRBECK and Е. W. Cuckow—The electron microscopy of human hair pigments. Е. №. DAvip—The 
transformation of discrete variables. К. GRUBB and S. SjósrEpr—Blood groups in abortion and sterility. 
Н. Harris, URSULA MrrrwocH, ELIZABETH В. Rosson and Е. L. WARREN— The pattern of amino-acid secretion in 
cystinuria, M. CAMPBELL and Р. E. PoLANI—An aetiological study of congenital heart disease. REVIEWS. 


Vol. 19, Pt. 4 May, 1955 


B. WooLr—On estimating the relation between blood group and disease. J. GUREVITCH, E. Hasson, E. MARGOLIS 
and C. POLIAKOFF—Blood groups in Jews from Cochin, India. J. GurevircH and E. MancGoLis—Blood groups in 
Jews from Iraq. J. GunEvrTCH, E. Hasson, E. MARGOLIS and C. PoLiakorF—Blood groups in Jews from Tripoli- 
tania. ELizABETH B. ROBSON— Birth weight in cousins. HELEN M. RANNEZ and SALOME GLUECKSOHN-W AELSCH— 
Filter paper electrophoresis of mouse hemoglobin. Preliminary note. L. S. PENROSE and SHEILA MAYNARD SMITH— 
Monoxygotic and dizygotic twin diagnosis. J. W. Н. LUGG and J. M. WHvrE— Taste thresholds for PTC of some 
population groups. SYLVIA D. LAWLER and J. Н. RENwiCK—Genetical linkage between the ABC group and the 
nail-patella locus. E. C. R. REEvE—Inbreeding with the homozygotes at a disadvantage. REVIEWS AND SHORTER 
NorictEs.—INDEX TO VOLS. 15-19. 


Subscription price 57s. net per volume of four quarterly parts. Single issues 15s. (postage extra) 


CAMBRIDGE UNIVERSITY PRESS 
BENTLEY HOUSE, 200 EUSTON ROAD, LONDON, N.W. 1 


The Annals of Mathematical Statistics 


The Official Journal of the Institute of Mathematical Statistics 


VOL. 26, NO. 2 CONTENTS JUNE, 1955 


On Tests of Normality and Other Tests of Goodness of Fit Based on Distance Methods. M. Kac, J. KIEFER and 
J. Worrowrrz—Some Classes of Partially Balanced Designs. К. С. Bose and W. Н. Cratwortuy—The 
Distribution of Length and Components of the Sum of п Random Unit Vectors. J. ARTHUR GREENWOOD and 
Davip Duranp—On the Efficiency of Experimental Designs. SYLVAIN EHRENFELD—On the Approximation of 
a Distribution Function by an Empiric Distribution. JEROME BLACKMAN— The Extrema of the Expected Value 
of a Function of Independent Random Variables. WassiLY HorrrpING—Estimates of Bounded Relative Error 
in Particle Counting. M. A. GIRSHICK, Н. RUBIN and К. SITGREAVES—A Characterization of Sufficiency. R. К. 
BanuapuR—Distribution of the Maximum of the Arithmetic Mean of Correlated Random Variables. JOHN 
GURLAND—Significance Probabilities of the Wilcoxon Test. EVELYN Fix and J. L. HODGES, JR.—On the Fourier 
Series Expansion of Random Functions. W. L. Root and Т. S. PITCHER—A Characterization of the Gamma 
Distribution. EUGENE LUKACS— The Ratio of Variances in a Variance Components Model. W. A. THOMPSON, JR. 
—On Moments of Order Statistics from the Weibull Distribution. JULIUS LIEBLEIN—On the Asymptotic 
Normality of Some Statistics Used in Non-parametric Tests. MEYER Dwass—Empirical Power Functions for 
Nonparametric Two-sample Tests for Small Samples. D. TEICHROEW—NOTES: A Note on the Theory $ 
Unbiased Estimation. D. Basu—On Confidence Intervals of Given Length for the Mean of a Normal Dirba 
tion with Unknown Variance. LIONEL WEISS—NEWS and NOTICES—REPORT OF THE TREASURER OF THE INSTITUTE— 
PUBLICATIONS RECEIVED, 


Subscription rate $12.00 per year in the United States and Canada and $10.00 per year elsewhere 


INQUIRIES AND SUBSCRIPTION ORDERS SHOULD BE SENT TO 


ALBERT Н. BOWKER, TREASURER, INSTITUTE OF MATHEMATICAL STATISTICS 
SEQUOIA HALL, STANFORD, CALIFORNIA 


(vi) 


ECONOMETRICA 


JOURNAL OF THE ECONOMETRIC SOCIETY 


Contents of Vol. 23, No. 2, April 1955, include: 


HAROLD LYDALL. The Life Cycle in Income, Saving, and Asset Ownership 

J. D. SARGAN. The Period of Production 

A. J. GARTAGANIS and A. S. GOLDBERGER. A Note on the Statistical Discrepancy in the National 
Accounts 

GEORGE B. DANTZIG. Upper Bounds, Secondary Constraints, and Block Triangularity in Linear 
Programming 

Н. THEIL. Recent Experiences with the Munich Business Test 

HERBERT A. Simon. Causality and Econometrics: Comment 

HERMAN О. A. Wop. Causality and Econometrics: Reply 

REPORT ON THE UPPSALA MEETING 

Book Reviews, NOTES AND ANNOUNCEMENTS 


Published Quarterly Subscription rate available on request 


The Econometric Society is an international society for the advancement of economic theory in its relation to 
statistics and mathematics. 
Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying for 
membership should be addressed to , 
RICHARD RUGGLES, Secretary 
THE ECONOMETRIC SOCIETY, BOX 1264, YALE UNIVERSITY 


NEW HAVEN, CONNECTICUT, U.S.A. 


SANKHYA 


THE INDIAN JOURNAL OF STATISTICS 
EDITED BY P. C. MAHALANOBIS 


Vol. 14, part 4, 1955 CONTENTS 
Some Contributions to the Design of Sample Surveys. By Tosio KITAGAWA 

On Optimum Selections from Multivariate Populations, By DEs RAJ 

On a Characterisation of the Multivariate Normal Distribution. By R. G. LAHA 

Bi-modal Distributions derived from the Normal Distribution. By AYODHYA PRASAD 

Selection Tests for Skilled and Semi-Skilled Workers. By S. P. GHOSH 

Optimum Dilution Levels for the Estimation of Bacterial Density in Water. By S. J. Port, P. SiNHA and M. V. 


RAMAN 
A Suggested Application of Wald's Sequential Analysis to Railway Operations. By JAGJIT SINGH 


INDIAN STATISTICAL INSTITUTE: Twenty-second Annual Report, 1953-54 


SUBSCRIPTION CURRENT BACK NUMBERS 

per volume per issue per volume per issue 
INDIA Rs. 30 ‚1 Rs, 45 Rs. 12/8 
FOREIGN $10.00 $3.50 $15.00 $4.50 


should be sent to 


Subscriptions and orders for back nu 


Statistical Publishing Society 
204[1 Barrackpore Trunk Road, Calcutta – 35 


(vii) 


AMERICAN STATISTICAL ASSOCIATION 
1108 16th St., N.W. Washington 6, D.C. 


THE AMERICAN STATISTICAL ASSOCIATION | 


announces 
the publication of two new monographs: 


STATISTICAL PROBLEMS OF THE KINSEY REPORT by 
Cochran, Mosteller and Tukey. The evaluation of the statistical methodo- 
logy used by Kinsey and his associates in their first volume. This study was 
requested by the Committee for Research in the Problems of Sex of the 
National Research Council, which is sponsoring Dr Kinsey's work. The 
contents include Statistical Problems of the Kinsey Report, Discussion of 
Comments by Selected Technical Reviewers, Comparison with Other 
Studies, Proposed Further Work, Probability Sampling Considerations, 
The Interview and The Office, Desirable Accuracy, Principles of Sampling. 

The monograph contains 331 pages, plus a foreword, preface and index; 
bound in blue buckram. Price $3.00 to ASA members; $5.00 to others. 


PROCEEDINGS OF THE BUSINESS AND ECONOMICS 
STATISTICS SECTION. The papers given at the sessions sponsored by 
the Business and Economics Statistics Section at the Annual Meeting of 
the American Statistical Association in Montreal in September 1954. This 
volume contains papers on Pension Funds, Business Outlook, International 
Payments, Consumer Survey Data, Forecasting, Employment and 
Unemployment Statistics, Stock Market, Government Statistics, Measure- 
ment of Saving and Investment, Mobilization, Productivity. 

Approximately 250 pages, paper bound. Price $2.00 to ASA members; 
$3.00 to others. 


Copies may be ordered directly from the American Statistical Association, 
1108 Sixteenth Street, N.W., Washington 6, D.C. 


THE AMERICAN STATISTICAL ASSOCIATION 
INVITES AS MEMBERS ALL PERSONS INTERESTED IN: 


1. Development of new theory and method. 


2. Improvement of basic statistical data. 
A $ А 3. Application of statistical methods to practical problems, 


(viii) f 


VOLUME 42, Parts l AND 2 June 1955 


STUDIES IN THE HISTORY OF PROBABILITY AND STATISTICS* 
1. DICING AND GAMING (A NOTE ON THE HISTORY OF PROBABILITY) 


By F. N. DAVID 
University College, London 


‘See, this is new? It hath been already of old time.’ (Ecclesiastes i. 10.) 


1. A cynical archaeologist remarked recently that a symptom of decadence in a civiliza- 
tion is when men become interested in their own history, and he added that in the unlikely 
eventuality of any proof being required of the decadence of this phase of Homo sapiens it 
could be found in the present-day interest in archaeology. Most generalizations of a 
sweeping character such as this are unacceptable, chiefly because there is no way of putting 
them to proof; but the present interest of scientists in general, and of statisticians in par- 
ticular, in the origins of scientifie thought, is far from implying the decadence of science, 
whatever may be implied by an interest in the arts. 

It is inviting, and at the same time profitless, to speculate why modern scientists have 
such an interest. The possibility of deciding priority of discovery which concerned the 
Vietorian scientist so closely does not cause much controversy to-day, for the modern 
scientist would hold that to ascribe any discovery in the field of science to any single person 
is unrealistic. Thus, while we are taught at school, for example, that Newton and Leibnitz 
separately and independently ‘discovered’ the differential calculus, it would perhaps be 
more appropriate to say that Newton and Leibnitz each supplied the last link in the chain 
of reasoning which gave us the differential caleulus—a chain which can be traced back 
through Pierre Fermat, Barrow, Torricelli and Galileo, and that it is surprising that there 
were only two mathematicians who did this. 

Mathematics is essentially an expression of thought in which we build on the mental 
effort of our forerunners, and probability is no exception to this general rule. The real 
difficulty we meet with in trying to trace probability back to its origins is that it started 
essentially as an empirical science and developed only lately on the mathematical side. It 
is hard to say where in time the change came from empiricism to mathematical formalism 
as it appears to have taken place over hundreds of years; and the claims put forward for 
Pascal and Fermat as the creators of probability theory cannot entirely be substantiated. 


2. When man first started to play games of chance is a time problem we shall never 
clearly resolve. We may place on record that it is a commonplace thing for archaeologists 
to find a preponderance of astragali} among the bones of animals dug up on prehistoric 
sites. One archaeologist stated that he had found up to seven times as many as any other 
bone, another put the figure at 500 (sic!), while yet a third, refusing to be drawn to a figure, 
stated that they were many. This fact has probably little significance. The astragalus has 
little marrow in it and was possibly not worth cracking for the sake of its contents as were 
the long bones; it is knobbly and presents no flat curves for drawing as does the shoulder 


* [Editorial note. It is hoped to publish articles by а number of different authors under this 


general heading.] 
f The astragalus is a small bone in the ankle, immediately under the talus or heel-bone. See Pl. 2a. 


I Biom. 42 


2 Studies in the history of probability and statistics 


blade for example. All we may do is to place on record that round about 40,000 years ago 
there were large numbers of the astragali of sheep, goat and deer lying about. 

The astragali of animals with hooves are different from those with feet such as man, dog 
and cat. From the comparison in Text-fig. 1 we note how in the case of the dog the 
astragalus is developed on one side to allow for the support of the bones of the feet. The 
astragalus of the hooved animal is almost symmetrical about a longitudinal axis and it is 
a pleasant toy to play with. In France and Greece children still play games with them in 
the streets, and it is possible to buy pieces of metal fashioned into an idealized shape but 
still recognizable as astragali. 


Sheep 
{ 
Text-fig. 1. Drawings of the astragalus in sheep and dog, natural size. 


3. Some time between prehistoric man of four hundred centuries ago and the beginning 


of the third millennium before Christ Homo sapiens invented games and among these 
games, games of chance. We know from paintings, terra-cotta groups, etc., that the 
astragalus was used in Greece like the ancient quoit,* but there is no doubt from paintings 
on tombs in Egypt and excavated material that the use of the astragalus in games where 


it is desired to move counters was well established by the time of the First Dynasty. In а 
one painting a nobleman, shown playing a game in his after-life, delicately poises an — 


astragalus on his finger tip, a board with ‘men’ in front of him. A typical game of 
с. 1800 B.C. is that of ‘Hounds and Jackals’ illustrated in Pl. 1. The game seems similar 


to our present-day ‘Snakes and Ladders’; the hounds and jackals were moved according to 3 
some rule by throwing the astragali found with the game and shown in the figure. Variants 


of this game were undoubtedly played from the time of the First Dynasty (c. 3500 в.с.). 

It is possible but not altogether likely that these games originated in Egypt. They cer- 
tainly did not originate in Greece, as has been claimed for reasons which we shall give later. 
However, Herodotus, the first Greek historian, like his present-day counterparts, was 
willing to believe that the Greeks (or allied peoples) had invented nearly everything. His 
claim that the Lydians introduced coinage has about as much foundation as his claim 


regarding games of chance. He writes (c. 500 в.с.) about the famine in Lydia (which was 
c. 1500 в.с.) as follows: 


The Lydians have very nearly the same customs as the Greeks, They were the first nation to 
introduce the use of gold and silver coins and the first to sell goods by retail. They claim also the 
invention of all games which are common to them with the Greeks. These they declare that they 
invented about the time that they colonized Tyrrhenia, an event of which they give the following 
account. In the days of Atys, the son of Manes, there was great scarcity through the whole land of 
Lydia. For some time the Lydians bore the affliction patiently, but finding that it did not pass away, 


* From the name * knucklebone' we might infer that among the early games were those in which the 
astragali were balanced on the bones of the knuckles and then tossed and caught again. 


So 


Biometrika, Vol. 42, Parts 1 and 2 Plate 1 


David: Studies in the History of Probability and Statistics 


3 
E 
5 
S 
s 
E 
= 
B 
B 
© 
Es 
ч 
© 
E 
® 
E 
2 
Е 


(Facing р. 2 


Biometrika, Vol. 42, Parts 1 and 2 


David: Studies in the History of Probability and Statistics 


(9) 


*late 2 


/—— _ > 


Е. N. Davip . 3 


they set to work to devise remedies for the evil. Various expedients were discovered by various 
persons; dice and huckle-bones (i.e. astragali) and ball and all such games were invented, except 
tables,* the invention of which they do not claim as theirs. The plan adopted against famine was to 
engage in games on one day so entirely as not to feel any craving for, food, and the next day to eat 
and abstain from games. In this way they passed eighteen years. i 


In yet another commentary we are told that games of chance were invented during the 
Trojan war by Palamedes. During the 10-year investment of the city of Troy various 
games were invented to prevent the soldiers’ morale suffering from boredom. 


4. The game of ball is mentioned by Homer and according to Plato was evolved in 
Egypt. It is not, however, a game of chance. The story of dice we shall return to, but we 
may first carry the story of the astragalus a little further. In the early part of the first 
millennium it would seem that astragali were used by both adults and children for their 
leisure games. Homer (с. 900 в.о.) tells us that when Patroclus was a small boy he became 
so angry with his opponent while playing a game of knucklebones that he nearly killed him. 
Another writer of the same period tells us that students played knucklebones everywhere, 
that they were acclaimed as presents and that as a prize for handwriting one student was 
given eighty astragali all at once! It is not difficult to imagine the small boys of that era 
collecting astragali as they collected marbles, much as the boys of our own era still do. 

That the astragalus was used commonly in the gaming which the Greeks and later the 
Romans conducted with zeal and passion, the references in the literature of that time leave 
no room for doubt. One of the chief games may have been the simple one of throwing four 
astragali together and noting which sides fell uppermost. The astragalus has only four sides 
on which it may rest, and it has been deduced, among others by Nicolas Leonicus Thomeus 
(1456-1531), that a common method of enumeration was that the upper side, broad and 
slightly convex counted 4, the lower side broad and hollowed 3, the lateral side narrow and 
flat 1 and the other lateral which is slightly hollow 6. These aspects of a sheep’s astragalus 
are shown in Pl. 2a. (With present-day astragali the probabilities of scoring 1 and 6 are each 
approximately equal to 1/10 and those of 3 and 4 approximately 4/10.) The worst throw 
for the Greeks with one bone was unity which they called the dog, and sometimes the 
vulture. The best of all throws with four knucklebones was the throw of Venus when all 
four sides were different which has an actual probability of about 1/26. But at different 
times and in different games the numbers must have been varied, for the throw of Euripides 
with four astragali, discussed by several fifteenth-century writers, was worth 40. How the 
bones fell to achieve this result is not stated, although Cardano writing in the sixteenth- 
century states that it was four fours. (Probability с. 1/39.) 


5. In classical Rome the astragalus was imitated in carved stone with figures and scenes 
incised on the sides. A typical example is illustrated in Р]. 2b. Stone astragali have also 
been found in Egypt. At this time too we have the production of lewd figures in metal or 
bone varying in size from about 1 cm. to over 1 in. in height. That these figures were used 
for gaming may be deduced from the fact that the six possible positions in which the figure 
may fall are each marked with a number of dots. 

Besides the astragali it appears possible that throwing sticks were also used for games of 
chance, although it may be that they had a greater religious significance; we shall return 

* This may have been an early form of backgammon or may have been shuffle-board. d 

{ I have not tested these figures for bias. They are a development, I think, of dice rather than 


astragali. 
1-2 


4 Studies in the history of probability and statistics 
to this point later. The throwing stick was made of wood or ivory and was often approxi- 
mately 3 in. in length with cross-section when square of about 1 cm. each way. Su 
throwing sticks were known to the ancient Britons, to the Greeks, the Romans, the 
Egyptians and to the Maya Indians of the American continent. Sometimes the sticks 
elliptical in cross-section with major axis of approximately 1 cm., but they are all alike i 
having only four numbers on them, one at each end of the upper face and one at each e 
of the lower parallel face. In the European throwing sticks the majority of numbers are 
marked by small engraved circles, but they are sometimes indicated by cuts in the wood 
or ivory. The Maya throwing sticks are marked by coloured scratches in ivory. The actual 
numbers marked vary. They are mostly 1, 2, 5, 6, but 3 and 4 have also been noti 
These throwing sticks are of little importance in gaming. They are mentioned because it i 
interesting to note the likelihood that gaming originated at many points, and, although 
this is a remark one could not defend, that it possibly was originally a debasement of 
religious ceremony. 


6. The six-sided die may have been obtained from the astragalus by grinding it do 
until it formed a rough cube. The Musée de Louvre has several astragali which have beet 
treated in this way, but one cannot imagine they formed very satisfactory dice. 
honeycombed (or cancellous) bone tissue has been exposed in several places and the cru 
die would clearly not have a long life. Whether the die was evolved in this way or not t 
evolution must have taken place some considerable time before Christ. The earliest di 
known was excavated in northern Iraq and is dated at the beginning of the third millennium, 
It is described as being of well-fired buff pottery. The dots are arranged as shown in Text- 
fig. 2(i), the edges at A and B being imagined folded away from the reader. It will be noted 
that the opposite points are in consecutive order, 2 opposite 3, 4 opposite 5 and 6 opposite 1. 

A die excavated in Mohenjo-Daro (Ancient India) is also dated as third millennium, and 
it is also made of hard buff pottery. The order of the points is again consecutive, but this 
time we have 1 opposite 2, 3 opposite 4, 5 opposite 6. Few other dice have been recorded in 
this millennium. At the time of the XVIIIth dynasty in Egypt (c. 1400 в.о.) a die with the 
markings shown in Text-fig. 2(ii) must have been in play. The arrangement of the five dots 
is unusual. Somewhere about this time, however, the arrangement of the numbers settled | 
down to the familiar two-partitions of 7 opposite one another as shown in Text-fig. 2 (vi), 
which arrangement has persisted until to-day. Out of records, collected by the present 
writer, of some fifty dice of classical times made of crystal, ivory, sandstone, ironstone, 
wood and other materials, forty had the two-partition of 7 arrangement. A twelfth-century 
(4.D.) Greek bishop wrote that this was the way in which a die should be marked, and a 
sixteenth-century gambler theorizes that this arrangement was chosen to make it easy to 
check whether all the numbers had been marked on the die and no figure duplicated at the 
expense of leaving out another. One die of the first millennium is said to have 9 opposite 6, ' 
5 opposite 3, and 4 opposite 2. It may have been especially made for a particular game; 
alternatively, it is possible that it may have some ceremonial significance. This is possibly 
also true of a die marked as in Text-fig. 2 (iii), although it might have been a die used for 
cheating. 


7. That dice were used in Egypt is clear from the XVIIIth-dynasty specimen. It is 
thought, however, that dicing did not become common until the advent of the Ptolemaic | 
dynasty (300 to 30 в.с.) which originated from Greece. Several dice are known of this 


Е. N. Davip 5 


period including a beautiful specimen in hard brown limestone of side c. 1 in., whieh has 
the sacred symbols of Osiris, Horus, Isis, Nebhat, Hathor and Horhudet engraved on the 
six sides, This would almost undoubtedly have been used for some form of divination rites. 


^ А 


АТАТ ТА А) 
ЖАЗАЛА А 
W/N/ W NN 


dud 


(vi) Rock Crystal (vii) Iron (vill) Marble 
Text-fig. 2 


Dice in Britain were in a very primitive state at the time of Christ. The pieces used then 
were formed by roughly squaring the long bone of an animal and cutting it into sections 
to form objects approximately cubical in shape. The marrow was taken out leaving a hollow 
square cylinder of which the cross-section in diagrammatic form is sketched in Text-fig. 2 (iv). 
(The British Museum has two of these.) These primitive dice had the two partition of 7 
arrangement, 3 being opposite 4 on the hollow ends. Some dice had a 3 on each end and 
no 4. Several dice of this kind have been excavated in the chalk and flint country and dated 
late in the first millennium. 

The working out of the geometry of solid figures by Greek mathematicians appears to 
have been followed almost immediately by the construction of polyhedral dice. A beautiful 
icosahedron in rock crystal now at the Musée de Louvre is the most famous of these. (In 
the diagram of Text-fig. 2(v) it may be imagined the outline is folded away from the 
reader.) A figure with 19 faces badly cut but apparently imagined to be rectangular has a 
roman digit on each face from I to X, and above that the numbers rise by tens to C. The 


6 Studies in the history of probability and statistics 


number LX XX is missing, but the number X X appears twice on one face. There was also 
а die of 18 faces probably formed by beating out a cubical die, a die with 14 faces and so on. 
Faked dice were also not unknown. Apart from the device of leaving out one number 
and duplicating another it is stated that hollow dice have been found dating from Roman 
times. The unity on the face of the die forms a small round plate which can be lifted. It is 
suggested that a small ball of leather could be crammed into the hollow of the die through 
this hole in such a way as to cause the die to tend to fall in a predetermined manner. 


8. Gaming reached such popularity with the Romans that it was found necessary to 
promulgate laws forbidding it except at certain seasons. What game was played by the 
common people we do not know, but there are many references to those played by the 
emperors. In Suetonius's Life of Augustus (Loeb's translation) we find: 


He (Augustus) did not in the least shrink from a reputation for gaming and played frankly and 
openly for recreation, even when he was well on in years, not only in the month of December, but on 
other holidays as well and on working days too. There is no question about this, for in a letter in his 
own handwriting he says, ‘I dined, dear Tiberius, with the same company;...We gambled like old 
men during the meal both yesterday and to-day, for when the dice were thrown whoever turned up 
the “dog” or the 6, put a denarius in the pool for each one of the dice, and the whole was taken by 
anyone who threw the Venus’. 


There are several other references to gaming in this Life. Whether the word talis should be 
translated as dice or astragali (knucklebones) is a moot point. The die as we know it is 
usually referred to as ‘tessera’. The astragalus is often called the talus (or heel-bone), and 
this is the word Suetonius actually used. From the description of the play it would seem 
appropriate to read knucklebones for dice. 

In Suetonius's Life of Claudius we are told that Claudius was so devoted to dicing that 
he wrote a book about it, and that he used to play while driving, throwing on to a board 


fitted especially in his carriage. From another source we learn that he played right hand 
against left hand. 


9. These two instances are chosen to illustrate the passion for gaming which apparently 
possessed the Romans, and it is possible to cite many others. The question which constantly 
recurs to one while studying these games of the past is ‘Why did not someone notice the 
equi-proportionality property of the fall of the die?’ It is understandable that no theory 
was made to describe the fall of the astragalus. But the Greeks had performed the neces- 
sary abstraction of thought to make the mathematical idealization of the cube (and other 
solid figures); at first sight it seems curious that mathematicians did not then go on a little 
further and give equal weight to each side of the cube and so on. For if dicing and gaming 
generally were carried on by so many persons for so long that it was thought necessary to 
prohibit them, surely someone must have noticed that with a cube on the average any one 
side turned up as frequently as any other? We can only make guesses on this point, but it 
would seem to the writer that there are two possible explanations, the imperfections of the 
dice and their use in religious ceremonies. 

10. Imperfect dice. We speak of a true or a fair die nowadays when we mean that there 
is no bias apparent when the die is thrown. In Roman times, and presumably earlier, it seems 
to have been the exception rather than the rule for the die to be true. Many dice of the 
classical period have been thrown by the writer and they were nearly all biased but not all 
in the same way. For example, three classical dice from the British Museum gave the 
results shown in the table from 204 throws each. The arrangement of the pips on the dice 


F. N. Davip 7 


were as in Text-fig. 2(vi), (vii) and (viii). The rock crystal is a beautifully made die; the 
others are a little primitive, and the sides of the iron die are only approximately parallel. 
The marble and the iron dice are obviously biased, and this was true of many of the other 
dice examined. A photograph of a wooden die of the classical period is given in Pl. 2(c). 
It will be noted that one of the faces shown is not square, and the impression one has is that 
the owner picked up a piece of wood of convenient shape, smoothed it a little and engraved 
the pips. It would therefore have been difficult, except over a long period, to notice any 
regularity. 


| Number of pips ... | 1 5 6 | 
= | 

Rock crystal 30 34 37 
| Tron | 35 37 | 42 


Marble 


11. Divination. In spite of the imperfections of the dice it is probable that some theory 
might have been made if magic or religion or both had not been involved also. A scheme 
whereby the deity consulted is given an opportunity of expressing his wishes appears to be 
a fundamental in the development of all religions. As late as 1737 we have John Wesley 
deciding by the drawing of lots whether to marry or not (John Wesley's Journal, vol. 1, 
1737, Friday, 4 March), and in the praetices of present-day primitive tribes we get an echo 
from the classical era. At that time pebbles of diverse shapes and colours, arrows, astragali 
and dice were all used to probe the divine intention. In the temples there were various and 
varied rites attached to the process of divination by lot, but the main principle was the 
same. The question was posed, the lot was cast, the answer of the god was deduced. The 
dice (astragali, etc.) were thrown sometimes on the ground, sometimes on a consecrated 
table. 

Tt was customary in classical Greece and Rome for the four astragali of the gamblers to 
be used in the temples. The prediction was that the throw of Venus (1, 3, 4, 6 uppermost) 
was favourable and the dogs unfavourable. In the temple of the oracles tablets were hung 
up and the priest, or possibly the suppliant, interpreted the throw of the four bones by 
reference to the tablets. Cases have been recorded, however, where five astragali were used. 
Greek inscriptions found in Asia Minor give a fairly complete record of how the throws of 
five were interpreted. Each throw was given the name of a god. Thus Sir James Frazer 
translates (commentary on Pausanias): 

1. 3. 3. 4. 4—15 The throw of Saviour Zeus 

One one, two threes, two fours, 

The deed which thou meditatest, go, do it boldly. 

Put thy hand to it. The gods have given these favourable omens. 

Shrink not from it in thy mind. For no evil shall befall thee. 
It is not clear whether the order of the numbers is important or not. If order does not 
matter then the probability of this throw is about 0-08.* The tesserae of the gambler were 
also used in divination ceremonies as well as the astragali, and it is possible that the same 
interpretation was given to the numbers falling uppermost, although the presence of the 2 
and the 5 would make this a little awkward. 

* I propose to write at greater length on ‘divination probabilities’ on a further occasion. 


8 Studies in the history of probability and statistics 


In addition to the divination carried out by the priests it was apparently a commonplace 
for individuals to perform acts of divination with regard to events in their daily lives. Thus 
Lucian, telling the story of the young man who fell madly in love with Praxiteles’s Venus 
of Cnidos, writes: 

He threw four knucklebones on to the table and committed his hopes to the throw. If he threw 
well, particularly if he obtained the image of the goddess herself, no two showing the same number, 
he adored the goddess, and was in high hopes of gratifying his passion: if he threw badly, as usually 


happens, and got an unlucky combination, he called down imprecations on all Cnidos, and was as 
much overcome by grief as if he had suffered some personal loss, 


Again we have from Propertius: 


When I was seeking Venus (i.e. good fortune) with favourable tali, the damned dogs always leaped 
out. 

12. Itis perhaps of interest here to interpolate a note on divination as reported practised 
by the Buddhists of present-day Tibet. According to Hastings (Dictionary of Comparative 
Religions) the simplest method is carried out by the people themselves. Many laymen are 
equipped with a pocket divination manual (mé-pe) and the augury found by casting lots. 
This lot-casting can either be odds and evens (the random pouring of grain, pebbles or coins 
from a horn, cup, etc.) or dice on a sacred board or cards on which there are magic signs, 
or sheets or passages of scriptures drawn from a bowl. The reincarnation prediction is, it is 
said by Waddell,* usually carried out by a priest. The rebirth chart seen by the writer 
consists of 56 2 їп. squares (8 x 7). Each square corresponds to a future state. A six-sided 
die with letters on it is thrown down on to the rebirth chart, and according to the 
square on which it lands and the letter which falls uppermost so the priest predicts. 
Waddell, who visited Tibet as a member of a British Mission, obtained one of these charts 
and a die (c. 1893). He remarks: ‘The dice (sic!) accompanying my board seems to have 
been loaded so as to show up the letter Y, which gives a ghostly existence, and thus neces- 
sitates the performance of many expensive rites to counteract so undesirable a fate.’ 
Possibly a similar chicanery was practised in Roman times! It would seem a reasonable 
inference anyway that the mystery and awe which the religious ceremony would lend to 
the casting of lots for purposes of divination would prevent the thinking person from 
speculating too deeply about it. Any attempt to try to forecast the result of a throw could 
undoubtedly be interpreted as an attempt to forecast the action of the deity concerned, and 
such an act of impiety might be expected to bring ill luck in its train. In addition, as we 
have noted, a method for such forecasting could not easily be made owing to the imperfec- 
tions of most of the dice. On the other hand, it is possible that probabilities were known to 
the priests since the ceremonial dice are well made. 


13. Through the Dark Ages the Christian church appears to have carried on guerilla 
warfare against gaming with knucklebones and dice. The writers of the Renaissance make 
many references to bishops who write de aleatoribus or contra aleae ludum during the first 
fourteen hundred years of the Christian era. It is likely therefore that the bishops wished 
to get rid of the sortilege as a religious ceremony, and they succeeded to a certain extent in 
doing this, although divination by lot still survives to-day in the Moravian sect. What the 
bishops could not do was to stop men playing games of chance. There are several references 
in early French literature to gaming. The play of Jean Bodel, Le Jeu de Saint Nicolas, 
written c. A.D. 1200, has a scene where thieves are gambling in a tavern. They are playing 


* L. A. Waddell, The Buddhism of Tibet, W. Heffer and Sons Ltd. 1934 (2nd edition). 


F. N. Davip 9 


the dice game of ‘Le hasard’,* the rules of which were set down clearly by Pierre-Raymond 
Montmort in his book some five hundred years later. (Analys sur le jeux d'azard 1708, 
p. 113). Bodel's play is interesting for the suggestion that the thieves knew how to 
manipulate the dice to produce a desired result. 

14. With the invention of printing (c. 1450) and its rapid development during the latter 
half of the fifteenth century the references to games of chance become more numerous, but 
there seems to be no suggestion of the calculation of probabilities. Thus we find in the 
writing of Frangois Rabelais—a man who might be expected to know the latest in games 
of chance as played in taverns—the following interesting passage: ‘Then they studied the 
Art of painting or carving; or brought into use the antic play of tables, as Leonicus hath 
written of it or as our good friend Lascaris playeth at it’} (Gargantua and Pantagruel, 
Urquhart’s translation, Book 1, Chapter xxiv). Gargantua and Pantagruel was issued in 
sections at intervals between 1532 and 1552. The date of this reference will therefore be not 
long after 1532. 

The Leonicus of the reference is Nicolas Leonicus Tomeus, a professor of Greek and Latin 
at Padua who was born at Venice in 1456. He was well known for his learning and philo- 
sophical bent and acted as tutor to the English Cardinal Pole when as a young man he 
visited Italy. According to Erasmus he was ‘a man equally respectable for the purity of 
his morals and the profundity of his erudition’. His letters, which have been translated by 
Cardinal Gasquet, give an interesting picture of the life of an intellectual of that time. He 
died at Padua in 1531 and his collected works were printed at Basel in 1532. Rabelais is 
clearly referring to Leonicus’s treatise Sannutus, sive de ludo talario, a dialogue in the 
manner of Plato concerning the game of knucklebones (astragali). There is, however, little 
relevant to the calculus of probability in this work. The discussion turns on references to 
the game in Roman literature and a description and argument of the value of the various 
types of throw. 

A similar type of disquisition was written by Calcagnini about this time but possibly a 
little later than that of Leonicus. Celio Caleagnini was born at Ferrara in 1479 and died 
there іп 1541. He was a poet, а philosopher and astronomer of repute; his treatise Quomodo 
caelum stet, terra moveatur, vel de perenni motu terrae commentatio, in which he held that the 
earth moved round the sun, anticipated Galileo Galilei by some years, for Galileo was not 
born until 1564. The dissertation of Calcagnini entitled De talorum, tesserarum ac caleulorum 
ludis ex more veterum is less philosophical in tone than that of Leonicus. It is of interest to 
probabilists only in that it was an influence over Cardano, who, from his several references, 


* According to the editor, Е J. Warne, of the text of the play, Le Jeu de Saint Nicolas, ‘hasart’ 
meant the throw of a certain number of points at dice, varying according to the game played. In 
present-day probability theory the meaning is of course much wider. 

t Rabelais actually wrote ‘en usage l'antieque jeu des tables ainsi qu'en a escript Leonicus’. Duchat 
in the commentary on the 1741 edition says ‘Ce n'est point tables qu'il faut lire iei, comme dans toutes 


that he was referring to the ancient board game (from which the modern game of backgammon de- 
veloped) in which the ‘men’ may have been moved by throwing astragali, the counting of the throws 
being that described by Leonicus. 

{ I have not been able to trace why Lascaris is mentioned by Rabelais. Andre-Jean Lascaris 
surnamed Rhyndaconus (1445—1535), а Greek scholar born in Phrygia, was Librarian to François I. 
He rescued many Greek manuscripts from the Turks. Possibly he collected references to gaming in 
Greek literature much as Leonicus did for Roman? 


10 Studies in the history of probability and statistics 
15. We arrive at the sixteenth century, then, with a well-known humanist Leonicus, 


and a great astronomer Calcagnini, writing on games of chance with no attempt or reference ' 


to the calculation of a probability. (This does not mean of course that some calculations 
had not been made in a manuscript which we do not know about.) There were, moreover, 
other scholars and bishops writing on the same topic about this time, so that interest in the 
subject was keen. As far as we know at present it was left to Gerolamo Cardano to make 
the step forward. Cardano, the illegitimate son of a geometer, Fazio Cardano, was born in 
Pavia in 1501. His illegitimacy was a bar to his professional advancement on more than 
one occasion, and it is possible that the bitterness engendered by this fact was responsible 
for his not too scrupulous regard for the attribution of other scientists’ ideas. The crime of 
plagiarism was a common accusation among scientific workers of the sixteenth and 
seventeenth centuries, but of none was it raised more loudly than of Cardano who was 
strongly disliked by his contemporaries and despised by his successors. Until about the 
middle of the nineteenth century his biographers unite in regarding him as a charlatan; 
possibly at the present time the pendulum has swung too far the other way, and more is 
read into his writings than is justified. The truth would seem to lie somewhere between the 
extremes of charlatan and persecuted savant. 

Cardano was physician, philosopher, engineer, pure and applied mathematician, 
astrologer, eccentric, liar and gambler, but above all a gambler. He himself owns that on 
one occasion he sold his furniture and his wife’s possessions in order to get money to indulge 


his passion for gaming, and there is no doubt that this passion was one of the things which ~ 


ruled him through his whole life. His chief interest professionally was medicine, but he 
interested himself also in the communication of spirits and the casting of horoscopes. He 
does not seem to have been too successful at this last, but he was not deterred from casting 
that of Jesus, a performance the impiety of which probably led to his imprisonment. Even 
allowing for the exaggerations of his biographers there seems to be no doubt that he was 
eccentric to the point of madness. This did not prevent him, however, from making con- 
tributions to pure mathematics, and it is to this combination of pure mathematician and 
gambler that we owe the Liber de Ludo Aleae. This treatise was found in manuscript in 
Cardano’s papers after his death in Rome in 1576, and was first published in his collected 
works in 1663 at Lyon. Cardano implies that it was written c. 1526; the exact date is not 
important since no question of priority or plagiarism is involved, but it is curious that a 


manuscript of this kind should have survived fifty years of his remarkably variegated 
career. 


16. The first complete translation of de Ludo Aleae into English is given in Cardano, the 
Gambling Scholar, by Oystein Ore, published in 1953. Ore remarks that the book is badly 
composed and that understanding of Cardano’s work has possibly been hindered by this. 
There are some, however, who will not agree with his commentary on the treatise and who 
may feel that as much prescience is now attributed to Cardano as there was before too little. 


The crux of Cardano’s work is to be found in the section entitled ‘On the cast of one die’ in 
Ore’s translation: 


The talus has four faces and thus also four points. But the die has six; in six casts each point 
should turn up once; but since some will be repeated, it follows that others will not turn up. The 
talus is represented as having flat surfaces, on each one of which it lies on its back;. . .and it does not 
have the form of a die. One half of the total number of faces always represents equality; thus the 
chances are equal that a given point will turn up in three throws, for the total circuit is completed 


Е. №. РАутр 11 


in six, or again that one of three given points will turn up in one throw. For example, I can as easily 
throw one, three or five as two, four or six. The wagers are therefore laid in accordance with this 
equality if the die is honest. . .. 


We have therefore the necessary abstraction made; if the die is honest, i.e. if we may give 
equal weight to each side, then we may caleulate the chances. There is no doubt, I think, 
that Cardano was led to this conclusion empirically and his generalization of it is partially 
wrong. For he goes on to discuss casts of two dice and three dice giving tables which are 
correct if ‘the dice be honest’. When we come, however, to the section ‘On play with 
knucklebones’, it seems that he falls into error. The knucklebones (or astragali) have four 
sides. The different combinations of numbers which may arise in the throwing of four 
astragali are correctly enumerated, but the chances are calculated under the assumption that 
all sides are equi-probable; which they are not. Possibly Cardano had never played with 
astragali, for it is likely that if he had he would have noticed that to assume the sides of the 
astragalus had equal weight in his enumeration of alternatives was not adequate. But this 
fumbling suggests that he was not quite clear in his own mind about what he was proposing. 

I do not think that the fact that Cardano did not quite see the mathematical abstraction 
clearly can detract from the fact that he did, on paper at any rate, as far as we know, 
calculate the first probability by theoretical argument, and in so doing he is the real begettor 
of modern probability theory. The claims of his biographer that he anticipated the law of 
large numbers, etc., may not be acceptable; it would appear that Cardano was judging 
from his experience rather than his algebra. 


17. It would be strange if Cardano, following the mode of his age, did not communicate 
some of his thoughts about gaming to his pupils. Fear of being accused of plagiarism, fear 
of being plagiarized, may have kept him silent, but the whole tone of his treatise is a 
practical one; practical advice about playing, laying odds and so on make up a large portion 
of it. He would therefore almost certainly have discussed its contents with his friends, 
particularly if he thought about it over as long a period of time as he suggests. The fact that 
de Ludo Aleae did not appear in print until 1663 does not therefore seem to be a reason 
why Cardano’s ideas should not have been common knowledge to scholars in Italy after his 
death, and the way in which Galileo-Galilei plunges into his discussion of dice playing, 
without much preamble, tends to lend colour to this. 

Galileo-Galilei was born in Pisa in 1564, the son of Vincent Galilei, a musicographer well 
known in his day. He died in 1642 at Arcetri after a career as full of achievement as any 
that has ever been known. His contributions to science, both as astronomer and as mathe- 
matician, are striking for their originality of thought and clarity of purpose. Why this 
prince of scholars has never received the full recognition which is his due it is difficult to 
say. It is thought by some modern writers that his sensible recantation of the earth’s 
movement, after physical torture at the hands of the Inquisition at the age of 70, has caused 
a revulsion to him among the scientists of later years. This is probably not so; what is more 
likely is that the envious fellow-scholars who delivered him to the Inquisition conspired 
after his death to belittle the work which he had done. In this they were possibly helped 
by Galileo’s literary style which is noteworthy for clarity but not brevity, being in fact 
prolix and tedious in the extreme; no i is left undotted, no t is left uncrossed.* 

ж E. S. Pearson suggests to me that this prolixity was one which Galileo shared with many other 
Renaissance writers, and that it arose from the struggle which the early mathematicians must have 
had to formulate mathematical abstractions on paper. I think that this may well be so. 


12 Studies in the history of probability and statistics 

This being so if there was any doubt about the general method of procedure in calculating 
chances with a die we should have had a long disquisition on the subject. However, in 
Sopra le Scoperte de i Dadi* he plunges straight away into his argument. The problem? із 
one already touched on by Cardano. Three dice are thrown. Although there are the same 
number of three partitions of 9 as there are of 10, yet the probability of achieving 9 in 
practice is less than that of throwing 10. Why is this? I quote a little from E. H. Thorne’s 
translation of this note. The note begins: 


The fact that in a dice game certain numbers are more advantageous than others has a very 


obvious reason, i.e. that some are more easily and more frequently made than others, which depends 
on their being able to be made up with more variety of numbers. Thus a 3 and an 18, which are 
throws which can only be made in one way with 3 numbers (that is, the latter with 6, 6, 6 and the 
former with 1, 1, 1, and in no other way) are more difficult to make than, for example, 6 or 7, which 
can be made up in several ways, that is a 6 with 1, 2, 3 and with 2, 2, 2 and with 1, 1, 4 anda 7 with 


1, 1, 5; 1, 2, 4; 1, 3, 3; 2, 2, 3. Again, although 9 and 12 can be made up in as many ways as 10 and 11 
and therefore they are usually considered as being of equal utility to these, nevertheless it is known 
that long observation has made dice players consider 10 and 11 to be more advantageous than 9 
and 12. 


This extract serves to show how he begins the topic assuming that the calculations are 
known; it also serves to illustrate the prolixity of Galileo’s style. After some discussion of 
the six 3 partitions of 9 and of 10, he goes on: 


Since a die has six faces and when thrown it can equally well fall on any one of these, only six 
throws can be made with it, each different from all the others. But if together with the first die we 
threw a second, which has also six faces, we can make 36 throws each different from all the others, 
since each face of the first die can be combined with each of the second. ... 


After saying that the total number of possible throws with three dice are 216, he gives a 
table of the number of possible throws for a total of 10, 9, 8, 7, 6, 5, 4, 3, noting that the 
numbers 11-18 inclusive are symmetrical with these. Thus the number of possible throws 
for 10 is 27 and 25 for 9. His treatment of the problem is exactly that which we should use 
to-day and leaves us in no doubt that the calculation of a probability from the mathematical 
concept of the equi-probable sides of the die was clearly known to the sixteenth-century 
mathematicians of Italy. We can marvel at the person asking Galileo the question; he 


obviously gambled sufficiently to be able to detect a difference in empirical probabilities 
of 1/108.t 


18. Galileo’s collected works were first published in Bologna in 1656, but this fragment 
on gambling was not included. It does appear in the more complete collection published 
at Florence in 1718. Since, however, Galileo thought the problem of little interest, for he 
did not pursue it, there seems to be no reason why he should have made a secret of it, and 
following the custom of his day he probably instructed his pupils. At any rate it is evident 
that the mathematical probability set was no stranger to the French mathematicians of 
the seventeenth century, as is witnessed by the now famous correspondence between 
Pascal and Fermat in 1654. The first letter of the series, from Pascal to Fermat, setting out 
the problem of points is missing. We have, however, Fermat’s reply toit, and the subsequent 
йы ei Р title. Considerazione sopra il Giuco dei Dadi, а later title, appears first in the 

t Like Pascal sometime later, Galileo wrote to answer a problem put to him by a gambler. 


$ M. G. Kendall points out to me that the problem posed by the Chevalier de Méré to Pascal con- 
cerning the problem of points involved similar small probabilities. 


Е. №. Davip 13 


follow-up,* and from the way in which Fermat writes it seems clear that the actual defini- 
tion of probability is assumed known. What the two savants were interested in was the 
application of this definition to specific problems which were concerned with dice playing 
between gamblers of equal skill and opportunity. The approach to the problems is similar 
to that of Galileo, and the generalizations which are made from the particular cases dis- 
cussed are not well supported. 

It is true that Galileo wrote on one problem only and fairly briefly at that, but it is difficult 
to see why Pascal and Fermat should be preferred as the originators of probability theory 
before Galileo or Cardano. It may well be that the precocity of Pascal as a mathematician 
led to much of his work being accepted with acclamation, and certainly without its priority 
being questioned. We find, for example, the famous Arithmetic Triangle in Stifel's Arith- 
metic (1543), in the General Tratato of Tartaglia in 1556, in the Arithmetic of Simon Steven 
of Bruges (Leiden, 1625). It is possible that Pascal may not have known of these writers, 
However, he certainly knew of Pierre Herigone's Cours Mathematique (Paris, 1634), since 
he makes several references to it in his own Usage du Triangle Arithmétique pour trouver les 
puissances des binómes et Apotómes. Herigone uses a table of numbers analogous to the 
Arithmetic Triangle to find binomal coefficients. Perhaps this same aura which dazzled 
Pascal's contemporaries (and at the same time caused them to overlook some of Fermat's 
work) still blinds us to-day. 


19. If we take the origins for granted and look at developments of the theory, then by 
far the greatest impetus to theory during the years 1650-60 must have come from the 
publication of De Ratiociniis in Aleae Ludo by Christian Huygens. Huygens as a young 
man of 26 arrived in Paris in July 1655 on the equivalent of the English ‘Grand Tour’. 
He did not meet Pascal, Fermat or Carcavi, the intimate friend of Pascal, but he did meet 
Roberval, professor of mathematics at the Collége Royal de France, who is mentioned by 
Pascal as having been also approached by the Chevalier de Méré. Huygens stayed in Paris 
from July to November, and after his return to Holland he began a correspondence with 
both Carcavi and Fermat which lasted over a period of years. The young man's imagination 
was obviously fired by the discussions he had in Paris, and his mathematical ambitions 
stimulated by the immense activity of the group which some ten years later (1665) was to 
found the Académie des Sciences. He set himself to work, and in March 1656 he wrote to 
Prof. van Schooten that he had prepared a manuscript about dice games. Francis Schooten 
was professor of mathematics at Leyden and had been Huygens's teacher. He took the 
young Huygens's manuscript (which was written in his native language), translated it into 
Latin and published it as an appendix to his Exercitationes Mathematicae in 1657. (A French 
translation of this appendix can be found in Oeuvres de Huygens, tome 14, on ‘Calcul des 
Probabilités’ published by La Société Hollandaise des Sciences in 1920.) In this T'ractatus 
de Ratiociniis in Aleae Ludo Huygens sets out in a systematic manner what he must have 
learnt in Paris and adds some results which he may have achieved himself. 

In the letter to Francis Schooten he writes 


.. . quelques-uns des plus Célébres Mathématiciens de toute la France se sont occupés de ce genre 
de Calcul, afin que personne ne m'attribue l'honneur de la premiére Invention qui ne m'appartient 


* It is interesting to see Pascal fall into the same kind of trap which caused D'Alembert such 
controversy. In discussing the game of heads and tails and the tossing of a coin D'Alembert argued 
that the probability of throwing a head with two tosses of a coin was 2/3. For we may have ТТ, TH 
or H—when we stop, the second throw being immaterial, since we have achieved what we want. 


14 Studies in the history of probability and statistics 


pas. Mais ces savants. . .ont cependant cachés leurs méthodes. J'ai done dà examiner et approfondir 
moi-méme toute cette matiére à commencer par les éléments, et il m'est impossible pour la raison 
que je viens de mentionner d'affirmer que nous sommes partis d'un méme premier principe. . .. 


Accordingly Huygens begins by proving his basie propositions, deals at some length with 
the problem of points and then passes on to dice playing. His last proposition (XIV) has 
а familiar ring: 


If another gambler and I throw 2 dice turn and turn about with the condition that I will have 
won when I throw 7 points and he will have won when he throws 6, if I allow him to throw first, 
find my chance and his of winning. 


His delineation of his fourteen propositions is admirably clear and concise, and it is no 
marvel that the tract was used by mathematicians as a reference book up to the time of 
James Bernoulli (who reprinted it) and beyond. Possibly by this crystallization of the 
ideas of the French mathematicians Huygens has earned the right to be regarded as the 
father of the probability theory. 


20. After Huygens the interest of probabilists was not solely in gaming, although this 
interest did not die away for another hundred years or so. But with Huygens the new 
caleulus seems fairly launched, and this is therefore a suitable point to make a break. 
There are many questions which one leaves unanswered. The drawings and paintings by 
palaeolithic man of himself are very rare, and there is probably no hope of finding pictures 
of his recreations. If he prized the astragalus as a toy it seemed a possibility that he might 
have carved or decorated it in some way, but I have not been able to find any record of 
this. But while we cannot pull aside the curtain from four hundred centuries the possibility 
does exist that the pre-historians may be able, one day, to take us back a little farther than 
the third millennium. The farther back one goes the more fragmentary the evidence, but 
the earliest dice found are described as being of ‘ well-fired buff pottery’, and they certainly 
would not have been the first made. 

The tantalizing period to the present writer is the period from the invention of printing 
to A.D. 1600. In this period we have two mathematicians only calculating probabilities, 
and yet this was in the immense intellectual ferment of the Italian Renaissance. It seems 
hardly possible that there were not other natural philosophers who attempted similar 
caleulations, but such documents, if they exist, will only now come to light by chance. 

The correspondence between the French mathematicians of the first half of the seventeenth 
century is almost complete, and presumably the possibility does exist here of finding further 
letters. They all seemed at one time or another to send letters to one friend under cover of 
letters to another, and such letters may conceivably still be ascribed to the wrong person. 
However, enough information does exist regarding the seventeenth-century mathematicians 
to make a coherent study, and if I appear to have done them scant justice it is because 
I find the period во interesting that I hope to write about it more fully on another occasion 
elsewhere. 


Collecting information about dicing and gaming has been a hobby of mine for some time, 
and the list of persons who have drawn my attention to one aspect or another of it is 
formidable. I want to thank Prof. B. Ashmole of the British Museum who allowed me 
critically to examine the dice of the classical period which are in his care and M. Jean 
Charbonneaux of the Musée de Louvre who did me the same service. To Prof. C. M. Robert- 


Е. N. Davip 15 


son of my own college I owe not only the privilege of tossing many dice but many stimu- 
lating discussions and useful references. A. J. Arkell allowed me to examine the dice 
brought by Prof. Sir Flinders Petrie from Egypt and to photograph various gaming boards 
not reproduced here. The breadth of knowledge and wide reading of Miss M. S. Drower 
have acquainted me with many Egyptian board games which provide a fascinating puzzle 
for those interested in deducing how they are played. Miss J. Lowe and R. Graves drew 
my attention to various references in classical literature. The illustration of the Hounds 
and Jackals game is by the courtesy of the Metropolitan Museum of Art of New York. 

I want to thank Dr J. H. Willis who translated the Sive de ludo talario of Nicolas 
Leonicus for me, Dr Е. Н. Thorne who supplied a translation of Galileo’s letter on dice, 
Prof. B. Woledge who drew my attention to early French plays and Miss J. Townend 
who drew Text-fig. 1. Miss J. Pearson and Miss J. Edmiston helped me to find many 
references and A. Munday and Miss A. Lodge helped with photographs. The manuscript 
as a whole owes much to the keen critical faculties of Prof. E. S. Pearson and Prof. 
M. G. Kendall. Part of this work was carried out with the aid of a grant from the 
Central Research Fund of the University of London. 


[ 16 ] 


SOME FEATURES OF THE GENERATION TIMES 
OF INDIVIDUAL BACTERIA 


By E. О. POWELL 
Microbiological Research Department, Experimental Station, Porton 


INTRODUCTION 


The dynamics of bacterial growth in the mass has been the subject of a great deal of experi- 
mental study, but quantitative observations on the behaviour of individuals are scanty. 
In spite of the primary need for detailed knowledge of normal growth processes, it is the 
abnormal and morbid that have received most attention from both experimenters and 
theoreticians. Kendall (1952a) and Finney & Martin (1951) have recognized the necessity 
for extended studies of the normal condition; in this they have been prompted by the 
existence of two hypotheses (Kendall, 1948, 1952a; Rahn, 1932), both of which connect 
the observed scatter of generation times with an inner mechanism of fission. 

Measurements of individual generation times sufficiently systematic and detailed to 
make analysis worth while have been provided only by Kelly & Rahn (1932) so far as I know. 
My object here is, first, to add to the existing corpus of data; secondly, to show that for 
technical reasons, the experimental work so far done (including my own) is not really 
adequate to test the hypotheses that have been proposed; thirdly, to suggest further lines _ 
of study. 

According to Rahn’s hypothesis, the fission of a bacterium involves the simultaneous 
duplication of a number of essential entities or structures which may be the genes. The 
time required for duplication will not be the same for each, but will be subject to a law of 
‘chance’, and fission can only take place when every entity has been duplicated. Under 
certain assumptions, Rahn shows that the observed generation times should then be 
scattered according to the law 


dF = етт (1—е-—тт)о-1 dr, (1) 


where dF is the proportion of generation times іп the range 7 to т +dr. The parameter g is 
equal to the number of entities which have to be duplicated, and m is a parameter depending 
on the rate of the individual processes. In what follows, I call the frequency function (1) 
* Yule's distribution’ (since Yule (1925), was the first to describe it). 

Kendall's reasoning is similar, except that he supposes the events leading to fission to 
take place step by step, and fission to occur only when the series of steps is complete. Each 
step may or may not be a process of duplication. The observed generation times should then 
be scattered according to a Pearson Type III law in the form 

79-1 е-тіт 
аР = mT) dr, 
where g is the number of steps and m is a kinetic parameter as before. 

The idea of a distribution of generation times implies in practice (though this is not 

strictly necessary) some degree of permanence in the properties of the population to which it 


E. О. PowELL 17 


applies. At a given instant, the organisms in a bacterial culture certainly possess a definite 
age distribution, but this has no significance if the growth rate and cultural conditions have 
varied during the life span of the oldest extant organism. A fortiori, a distribution of genera- 
tion times, even if it be formally defined at an instant, will be a complex and imperspicuous 
character of its population unless the factors controlling it remain constant over at least 
several generations. Conditions adequate to the precise definition and determination of 
a distribution probably exist only, if at all, in a continuous culture maintained in a steady 
state. No measurements have yet been made on such a system, and I have assumed here 
(as has been done implicitly in the past) that during the phase of logarithmie growth in 
static culture the succession of generations proceeds in a regular manner. 1 give, however, 
two examples to the contrary. 

It is usually found that the mass growth rate (the logarithmic derivate of the mass of 
organisms per unit volume) in a static culture rapidly accelerates at the end of the lag phase 
to a steady value which is maintained for several hours. The number growth rate (the 
logarithmic derivate of the number of organisms per unit volume) often behaves less regu- 
larly. It may be small or zero even after the mass growth rate has reached its maximum 
value; abnormally long organisms are thereby formed. There follows a stage of acceleration 
during which the number of growth rate approaches and temporarily exceeds the mass 
growth rate. The two then become equal, in some cases only for a short time before the phase 
of decline sets in. These remarks apply to culture in a liquid medium. Detailed quantitative 
information about the growth of colonies on solid media over long periods (i.e. of many 
generations) appears to be lacking, and it is only on solid media that observations of in- 
dividual generation times can be made. With these difficulties in mind, it is clear that a 
satisfactory experimental procedure is not easily attained. Observations which are to be 
combined so as to furnish a distribution curve must either be confined to a brief period in 
the life of each culture examined, or, if extended, they must be shown to be drawn from a 
sensibly unchanging population. 

In a culture whose number growth rate is varying there is temporarily an excess or 
deficit of young organisms; a change in the growth rate implies a change in generation time 
distribution. It does not necessarily imply a change in all the parameters of that distribu- 
tion, but even if some one were known or assumed to be constant, it would be a matter of 
considerable practical difficulty to extract its value from a set of data derived from a 
changing population. At present it is not possible to test or to make use of Kendall’s (1948) 
relation between the coefficients of variation of clone size and generation time; a method of 
maintaining a constant growth rate for many generations must first be established. 

The results which I have to present consist of measurements of generation times carried 
out on six species of organisms (Bacterium aerogenes, strain N.C.T.C. 8197; Bact. coli 
anaerogenes, strain N.C.T.C. 4450; Streptococcus faecalis; Proteus vulgaris, strain LO; Bacillus 
mycoides, strain SR 2; B. subtilis, strain UP 1). Their distributions are compared with the 
frequency functions suggested by Kendall and Rahn. A partial analysis and discussion of 
some other features of the nexus of generation times are also given. I use the words 
‘mother’, ‘daughter’, ‘sister’ with an obvious extension of their usual meaning, and the 
neutral terms ‘inception’, ‘termination’ to denote respectively the events at which an 
organism becomes a recognizably separate entity (by fission of its mother) and by which it 
ceases to be so (by itself dividing). The same terms may also refer to the epochs of these 
events. In describing the morphological changes in the cell wall occurring at fission Ireplace 


2 Biom. 42 


18 Some features of the generation times of individual bacteria 


Bisset’s (1950) ‘rough’ and ‘smooth’ (which are overworked in bacteriology as well as in 
everyday use) by the mnemonic ‘septate’ and ‘isthmoid’ respectively (see de Bary, 1884; 
Schaudinn, 1902). 2 

In view of the present uncertainty about the internal organization of the bacterial cell, 
it is misleading to speak of ‘the nucleus’ as a single discrete entity. I have retained the word 
for convenience, but it should be considered to mean no more, perhaps, than ‘the com- 
plement of nuclear material’, 


EXPERIMENTAL ARRANGEMENTS 
(a) Method of making observations : allowance for bias 
If observations of generation time are made on all the progeny of a single organism, and if 
the experiment is broken off at an arbitrary moment, the resulting sample will be biased in 
favour of short-lived organisms. For the organisms extant at the end of the experiment will 
have, in mean, a generation time greater than the average. The bias is by no means trivial, 
as I shall show. (Particular cases can be imagined in which these statements are not true; 
they are of no practical consequenee.) An unbiased sample is obtained if observations are 


38 


Fig. 1. A family tree of micro-organisms, 
illustrating bias. 


Fig. 2. A family tree of B. mycoides. The numbers 
give the life span of the organisms in minutes. 


made of the same number of generations in every line of descent from the ancestor— 
provided that that number is decided beforehand and not allowed to be contingent upon 
the incidence of meal-times, for example. In the diagram of Fig. 1, the horizontal lines 
represent, against a uniform time scale, the life-spans of individuals in a family tree. The 
clone formed at the time X consists of six organisms of the third generation and four of 
the fourth; the times a... form a biased sample, by inclusion of g and ^. The times a... tA 
i.e. two complete generations from the ancestor A, are without bias. 

The method of observing every member of one or more trees of this sort, up to a fixed 
number of generations in every line of descent, was apparently adopted by Kelly & Rahn 
(1932), though their figures suggest that it was not adhered to quite rigidly. For organisms 
whose generation times are relatively little dispersed, the method is satisfactory. 

When the scatter of generation times is large, the foregoing method becomes inefficient, 
for the total duration of a given number of generations is also widely scattered, and in the 
course of an experiment many possible observations will have to be rejected if bias is to be 


E. О. PowELL i 19 


avoided. An extreme example is shown in Fig. 2. To meet the difficulty, which arose in 
studying B. mycoides and B. subtilis, I adopted the device of systematizing experiments во 
that bias could be accepted and afterwards allowed for in a simple way. 

Consider a group of organisms, 1, in number at an arbitrary time ¢ = 0, whose growth is 
followed for a period 7 (n, is supposed to be great enough for the growth to be regarded as 
continuous). In the period 0 «t < £ there will be л(е# — 1) fissions and so n,(e*7 — 1) in the 
whole period 7'. (Here Ё is the mean growth rate.) But an organism of generation time 7 will 
terminate within the period 7' only if its inception occurs at a time t < T — 7. Thus of all the 
organisms whose generation time is and whose inception lies within the period of observa- 
tion, only a fraetion ект) | 


a(t) = 7-1 (2) 


will on the average terminate, and so be measurable, in that period. 

It is now possible to make use of all the observations available from studying the repro- 
duction of organisms for a fixed time instead of for a definite number of generations. If 
the true distribution is 


dF = f(r)dr, 
the observed distribution will be 
dF = Ba(r) f(r) dr, (3) 
with a necessarily finite range 0 <7 < T. The factor В is chosen so as to make 
T 
| Фе 1. 
0 


Since generation times exceeding 7’ are excluded, f(r) can only be found as a distribution 
truncated at 7’; in practice the loss is negligible. 

Accordingly, all the experiments on B. mycoides and B. subtilis were arranged to occupy 
a period of 80 min. (about three mean generations), so that the results could afterwards 
be combined and the above average allowance for bias made when fitting frequency 
functions. : 

The factor о(т) has a surprisingly large effect. For example, when т has its mean value 
7 and T is 37, a(r) is $. With very large 7, (7) is nearly exp ( — kr) and so is still as much 
as 1 for т = 7. Its introduction is no matter of pedantry. 

This reasoning can only be correct if (i) m is very large; (ii) the culture has been growing 
steadily for a sufficiently long time for a stable age distribution to have been established; 
(iii) the n are a fair sample from that distribution. Dr W. A. O'N. Waugh (private com- 
munication) has given a more exact treatment for cases in which these conditions are not 
met, but his calculations show that a few generations of growth are enough to satisfy (ii). 
The circumstances of the actual experiments and the tests applied for constancy of growth 
rate indicate that the error in taking о(т) as the biasing factor is negligible. It may very 
well be that there is always a human bias in the selection of organisms for observation, 
contrary to condition (iii), but it is scarcely possible to take account of it or to eliminate it 
without complicating the technique. 

Observations of the growth of the selected organisms were made at intervals of 2 min., to 
correspond roughly with the uncertainty in estimating times of fission, The uncertainty in 
generation time was then about +3min. This is an average figure only; the uncertainty 
was greater with organisms of unusually long generation time, which generally divided in 


a leisurely manner. On the other hand, unusually short generation times were estimated 
2-2 


20 Some features of the generation times of individual bacteria 


much more accurately since inception and termination took place in close juxtaposition 
both temporally and spatially ; records of generation times of 2, 4 and 6 min. may be taken 
as correct to + 1 min., and therefore also correct as to their frequency. As it turns out, this 
is an important consideration. 

The uncertainty introduces an error of a special kind. A generation time which should 
be recorded as т may be recorded as 7 — 2 or r + 2 for instance; it will then appear in a fre- 
quency group adjacent to the correct one, and so there will be correlated errors in the 
frequencies. The effect will increase the variance, probably to a quite negligible extent, but 
there may also be a concomitant error in the ү? between the observations and a fitted func- 
tion, making the fit seem better or worse than it is. I have found no way of estimating or 
allowing for this aberration. 

I wish to make it quite clear what is here meant by a measured generation time 7: The 
fission of an organism is first observed to be complete, as nearly as can be judged, at a time 
t say. It has therefore divided at some time between t,—2 and t,. Similarly, one of its 
daughters terminates at some time between /; — 2 and t. Then the generation time of that 
daughter is recorded ав ¢,—t,, which is always a multiple of 2. Thus т is the mean value of 
the generation time of the group of organisms to which it is ascribed. I make this simple 
point because a careful study of Kelly & Rahn’s (1932), Rahn’s (1932) and Finney & 
Martin's (1951) papers raises doubts as to what Kelly & Rahn have actually recorded. 
Their figs. 1 and 2 suggest that they proceeded as I have done (though with a 5 min. instead 
of a 2 min. interval between observations), but in their tables the observations are grouped 
into ranges 0-5, 5-10, 10-15, etc. Rahn and Finney & Martin have the misleading phrase 
‘fissions ... observed in successive five-minute intervals’. Finney & Martin assumed that the 
mean generation times for these intervals were 2}, 73, 12}, ...; they may have been 
25, 5, 10, 15, ... or 5, 10, 15, 20, .... I have verified if either of these alternatives is chosen, 
the goodness of fit of Yule’s distribution is about the same as that obtained by Finney & 
Martin (xg = 14-9), butthe corresponding estimates of g are widely and significantly different : 
18:2 and 36-2 respectively, as against 26-0. 

Other observational arrangements are possible. For instance, the distribution of genera- 
tion times can be deduced from the age distribution in a culture. If ф(а) is the frequency 
function of age and f(r) that of generation time, it is found that if the culture is growing 
Steadily t 
фа) = Hoeta [^ f(r) ar (4) 


(cf. Harris, 1951). Unfortunately, the form of the curve (a) is dominated by the value of k; 
it is very insensitive to changes in the parameters of f(r) that leave / the same. Experi- 
mentally, the determination of an age distribution at a given time requires a study of the 
previous history of the culture over a period long enough to give the generation time dis- 
tribution directly. I have, however, made use of the relation (4) in the reverse sense, as a 
check on the regularity of growth in experiments with B. mycoides and В. subtilis. 

Again, suppose that a number of organisms extant at the epoch / = 0 are watched until 
they divide. The times t = 0 of fission will be distributed with a frequency function 


4(0) ال‎ е-® f(r) атаб. 


This gives another possible method of determining f(7); I have made no use of it. 


E. О. Power N 41 


d в. ee E A di 
(b) Practical details — . 

With one exception, organisms were grown on cellophane over tryptie meat broth in the 
culture chamber described by Harris & Powell (1951). The medium was constantly circulated 
so that an approximation to a condition of continuous culture obtained; the volume of 
medium (100 с.с.) was so large relative to the inoculum used (с. 100 organisms) that no 
appreciable change could occur during the course of an experiment, either by depletion of 
nutrients or accumulation of products of metabolism. In this respect the technique is 
superior to that of Orskov (1922) adopted by Kelly & Rahn. On the other hand, times of 
fission ean be judged with less precision in cellophane culture; the average error appeared 
to be about + 2min. However, this contribution to the total variance is only a small fraction 
of that due to the organisms themselves. 

The cultures were incubated throughout at a temperature of 35 + 0:5° C. The two Bacillus 
species were inoculated as spores, the rest as saline dilutions from a 20hr. liquid culture. 
'To avoid confusion, the organisms on which observations were to be made were separated 
from their neighbours by means of a micromanipulator во that each lay within a clear area 
of cellophane. They were exposed to light as little as possible. Evaporation from the surface 
of the cellophane was controlled so that each organism or group of organisms was sur- 
rounded by a visible fillet of liquid (Harris & Powell, 1951). The evaporation was tem- 
porarily increased only when necessary to verify the occurrence of fission. A photographic 
method of recording was rejected after trial as less informative than direct examination, 
and no less tedious when the processing and assessment of films were taken into account. 

No difficulty was encountered in working with Bact. aerogenes and Bact. coli anaerogenes. 
A typical experiment consisted in following the development of four organisms simul- 
taneously for two or three complete generations. Organisms were selected for observation 
always at about 2hr. from inoculation, previous experience having shown that erratic 
behaviour at the end of the lag phase would by then have subsided. 

Strep. faecalis was found to be very sensitive to light; a few minutes exposure to the usual 
conditions of illumination sufficed to stop growth. The effect was minimized by adding 
5% of sheep serum to the medium and illuminating with red light as dim as could be 
tolerated. As a further precaution, observation was confined to one complete generation 
only of the progeny of the selected organisms. The numerical results suggest that these 
measures were effective in securing uniform unrestricted growth over the short period 


necessary. Observation was begun at 2] hr. from inoculation. 
The mode of fission of Pr. vulgaris in cellophane culture was erratic, and seemed sometimes 


to be intermediate in character between the septate and the isthmoid. The first signs of 


a waist were apparently accompanied by the formation of a septum, and the organisms would 
to assess generation 


remain for several minutes in this state of incipient fission. Attempts 
and this species was therefore studied in eulture on an agar- 
tryptic meat medium under incident dark-field illumination (Pearce & Powell, 1951). 
Judging by the lengths of organisms as then seen, the uncompleted fissions observed by the 
vertical illumination on cellophane were recorded as complete fissions under the dark field. 
For this organism, then, the conditions of culture were similar to those in Kelly & Rahn’s 
experiments. For 34-4hr. from inoculation it appeared to grow and reproduce steadily, 
but then the rate of fission declined, longer motile organisms were formed and swarming 
began. In order not to encroach on this phase of changing growth rate, observations were 


times were unsatisfactory, 


22 Some features of the generation times of individual bacteria 


again confined to one generation only, beginning at 23 hr. from inoculation. (This organism 
did not swarm on cellophane over a circulating medium, presumably because no local 
concentration of metabolic products could be built up. See Lominski & Lendrum, 1947.) 

Observations on B. mycoides were carried out during the 80 min. beginning at 24 hr. from 
inoculation. By that time the spore coat had been shed and families of two to four organisms 
developed. Three groups of experiments were conducted with B. subtilis: series (i), observa- 
tion begun at 2} hr.; series (ii), observation begun at 44 hr.; series (iii), organisms growing 
over defibrinated sheep blood diluted with its own volume of normal saline ; observation 
begun at З hr. In order to check the regularity of growth of these two species, the ages of the 
organisms extant at the end of each experiment were recorded, so that their distributions 
could be compared with those calculated by means of (4) from the generation time 
distributions. 


THE FREQUENCY FUNCTIONS 


I give in this section a statement of the frequency functions which are made use of below, 

together with their relevant properties. Each is limited to positive values of the variate, 

and each involves a parameter g which, whatever its interpretation, is a measure of intrinsic 
î dispersion, i.e. a function of the coefficient of variation only. 

Fitting was carried out either by the methodof maximum likelihood or by that of moments 
(sometimes both). In several of the examples the distributions are such that the method of 
moments is of low efficiency (cf. Fisher, 1922). However, I do not think that the results are 
of much more than comparative value in any case, and the points I shall have to make are 
not so nice as to call for great statistical refinement. 


(a) The Pearson Type III distribution (Kendall’s hypothesis) 
e. 70-1 em 
= —__dr. 
~ mg) 
This is a well-known function. The cumulants are 


K, = m'g 1 (r), 
whence the first three moments 
1 = Mg, fa = mg, д = mg. 
The coefficient of variation is 


€, = gt 
and the skewness and kurtosis E 


Yi = Kalk} = 29-4, y, = [el = 6g. 
Both tend to zero for large g. 
Maximum-likelihood estimates ў, * of g and m are given by 
У/тъ®—ў% = 0, 
Xf logr/n—logm—y(9) = 0, 
where the f are observed frequencies of the generation times 7, n is Uf and ¥(9) is 


dlog l'(9)/dg. 


E. О. POWELL 23 
(b) Yule's distribution (Rahn's hypothesis) 
dF = fe (1—e77/my-1 dr, 


Except for i, the moments are difficult to calculate and are best obtained through the 
cumulants (cf. Daniels, 1952; but see Yule, 1925). The cumulant generating function is 
easily found to be 

K(t) = log P(g + 1) - log '(1— itm) —log T(g + 1 — itm), 
whence к, = ( — тг) - T, (g + 1)}, 


where for uniformity I write 


T(x) = (d/dxy (log Г(х)}, Ty(x)=y(2). 
In particular, д = кү = т{Гү(ф+1)—1Г\у(1)}, (5) 
Ha = Ka = т*{Г,(1)— Talg + 1)), i (6) 


= T) -Tg € Dy 
Ty 31)- Га) ' 
y, = Pet 0-T40). 
tU (8a) Tug Dii 
The curve is distinguished from a Type III having the same first two moments by its 
slower rise near zero, and by its greater steepness on the left flank. In fact, for increasing g, 
y, diminishes but tends to a still rather large non-zero limit 


— m6! T,(1) = 1-14. 


In practice y, scarcely differs from 2-4 (= 3!(7*/90) (6/72)?). 

Some other features are discussed by Finney & Martin (1951) and Rahn (1932), who note 
the insensitivity of the form of the curve to changes in g. 

The maximum-likelihood equations for fitting are 


2 + flog (1e) = 0, 
9 


^ И 1 A 
-3 уута - 9 —z film Lg, 
т, йу, eum] 
The solution is greatly facilitated by the tables of Einstein functions due to Sherman & 
Ewell (1942). 


(с) The Pearson Type 111 distribution with allowance for bias 
From equation (3), if the true distribution is of Pearson Type III, that derived from data 
which are biased in the way already described will be 
79-1 grim 
Tg) 7 (7) 
where «(7’) is given by equation (2). This is a frequency function in the range 0<7 < T 


(T = 80 in all experiments); the moments strictly involve incomplete gamma functions, but 


dF = Ba(r) 


24 Some features of the generation times of individual bacteria 


these are here so nearly unity that no appreciable error arises in taking the range of integra- 
tion as infinite. And then it is found that 
e*T(14-km)-? —1 2 
eFT(1 + km)-?3 — 1 | 
eT (1-- km)-? —1 J^ 


, 


A mg 


у (XT (1 + km)-7-3—1 
ps = m*g(g + D Grea 


with Ё = (log 2)/(mg). The parameters are hence obtained by iteration; approximate values 
of m and g are inserted into the expressions in curly brackets, and the two equations for 
the moments solved to give a closer approximation which is used as the starting point for 
a second cycle, and so on. For the terminus a quo, it is enough to take m = 1, and 
т = gm = 25min. (a value roughly known a priori). 


(d) Yule’s distribution with allowance for bias 
dF = Ba(r) ema — e~m- dr. (8) 


With the same approximation as in the previous paragraph, 


B = (eT — 1)H, 
= (Tyg) ry | 010 Ан) i fs |, 
из = mT (1) -Taig +1) + (Ts (g + 1) - T,())3] 
X А ГЫ + km) AQ + km g) + (P (1-- Em) - T,(1 + km eg) _ i 
P(1)-P3(g +1) + {P(g +1)— Гү(1)}# 


x[H — 1]71, 
where k = (log2)/m(T',(g + 1) – Г(1)}, 
H —тГ@+1)Г(1+ km) 


T'(g - 14- km) 


After fitting the function (7) to a set of data, estimates of the true moments were calculated 
from its parameters, and were used to obtain values of the parameters of (8) by means of 
(5) and (6). These values were taken as the first approximation in fitting (8), iteration being 
carried out as in the previous section. 


ANALYSIS OF THE DATA 


General 


The fission of a micro-organism, as usually understood, consists in its separation into two 
geometrically distinct pieces, each with a closed cell wall separating it from its surroundings. 
The closure of the wall follows the division of the cytoplasm at some unknown interval, but 
in unicellular organisms there is at least a unique one-to-one correspondence between the 
true cell division and the observed fission. In the genus Bacillus this isnot so; the organisms 
are multicellular, and at any time contain party walls of various degrees of completeness. 


E. О. POWELL 25 


The party walls may or may not be laid down as double layers, but they cannot be seen to 
be во, and it is not known that each becomes necessarily a site of fission; in these organisms 
fission is a tertiary rather than a secondary process (for recent work on the structure of the 
party wall, see Dawson & Stern, 1954). Thus the generation times of B. mycoides and 
B. subtilis have not the significance for the hypotheses of Rahn and Kendall that those of 
the other organisms have. (The same applies to Kelly & Rahn's measurements on В. cereus; 
Rahn recognized this.) There is no a priori reason why either hypothesis should not apply, 
as regards its mathematical form, to the Bacilli, but the ‘genetic’ parameter g cannot then 
be interpreted as the number of genes or primitive steps determining fission. For the 
moment, little more than a descriptive treatment of the generation times of these organisms 
can be usefully attempted in this context. 

The raw data are set out in Table 1, together with the first two moments and the skewness 
and kurtosis of each distribution. The range of generation times is very great, and much at 
variance with the vague assumption often made, that bacteria divide rather regularly. 
A striking feature in all the experiments was the highly erratic behaviour of small families 
of organisms (e.g. the six or fourteen constituting the progeny, up to two or three genera- 
tions, of a single ancestor); in some the generation times are quite uniform, in others widely 
dispersed. This unforeseen lack of homogeneity makes statistical analysis difficult, and 
renders doubtful the significance of any broad and simple treatment. Such quantitative 
analysis as I have been able to apply can only be a makeshift; but a more refined investiga- 
tion will not be justified until a larger range of data has accumulated. 

Tn the paragraphs which follow, some forward references are unavoidable; homogeneity 
is considered first because it affects the methods adopted in assessing the parameter g, 
but the results of curve-fitting are also relevant, and have been used to modify the tests of 
homogeneity. 

Homogeneity of the data 

Because of the special difficulties associated with В. mycoides and B. subtilis, only the 
four unicellular organisms Bact. aerogenes, Bact. coli anaerogenes, Strep. faecalis and 
Pr. vulgaris will be considered in this section. 

Kendall (1948) by applying Bartlett’s (1937) test for homogeneity of variance, showed 
that the estimates of his g obtained from the several experiments of Kelly & Rahn were 
highly inconsistent; and later (1952 а) he expressed doubts as to the propriety of combining 
` the data to obtain an overall estimate of g. Such doubts are of course justified if the source 
of variation is a change in the experimental conditions, and certainly Kelly & Rahn’s 
results strongly suggest that day-to-day fluctuations are considerable. 

Any one experiment of the present series yields measurements of the generation times of 
organisms belonging to several families, each the progeny of a single organism selected at 
the beginning of the experiment. The organisms all lie within a single field of the microscope 
and are subjected therefore to nearly identical environmental conditions. The variance of 
generation times within the experiment must be that proper to the organism under those 
conditions, and the coefficient of variation (a constant for each species by hypothesis) 
should yield an unbiased measure of g. 

Now it remains true of the present results, as well as of Kelly & Rahn’s, that there is 
apparent lack of homogeneity between experiments. An examination of the coefficients of 
variation showed them to be fairly uniformly distributed, except for a few which were out- 
standingly large; the principal contribution to the variance in these cases was provided by 


26 Some features of the generation times of individual bacteria 


Table 1. Frequency of generation times, with the first (ui, =7) and second (д) 
moments and the measures у,, у, of skewness and kurtosis 


Generation Bact. Васі. coli Strep. Pr. B. B. subtilis | B. subtilis | B. subt 
times aerogenes M faecalis vulgaris | mycoides | series (i) | series (ii) | series (iii)] 
А | Í j 
| 
0 рен = = | бз =e ne = = 
2 E — — -- 2 4 1 1 
4 — — — — 4 6 2 2 
6 2 3 1 4 6 8 2 3 
8 3 8 0 4 10 15 6 2 
10 4 vi 0 2 12 13 3 3 
12 13 24 1 5 19 26 13 6 
l4 35 45 4 9 13 24 14 15 
16 48 49 6 7 j 20 32 23 27 
18 75 56 13 20 | 25 34 24 35 
20 72 47 8 34 22 42 25 25 
22 65 53 15 42 27 27 25 40 
24 40 39 9 42 23 39 34 44 
26 42 30 20 4l 17 19 30 32 
28 29 17 12 29 12 22 32 28 
30 9 16 11 34 9 13 28 28 
32 13 8 8 26 | 14 15 19 18 
34 2 3 4 25 5 17 24 14 
36 + 4 3 17 6 8 6 12 
38 1 7 0 19 11 1 11 12 
40 1 0 0 18 9 1 11 и 
42 312 4 1 1 4 3 7 9 
444 0 1 0 4 3 тя 3 1 
46 2 — 0 3 4 — i 5 
48 — — 1 1 1 — 1 1 
50 — — 0 1 1 — 2 
52 — — 1 0 0 — 2 0 
54 — — — 2 0 — 1 1 
56 — — — 2 l — 2 1 
58 — — — 3 0 — 1 0 
60 — — — — 0 — — 2 
62 — — — LI 0 M 2 n: 
64 — ЕЗ — — 1 = — — 
66 — — -— ES 0 RU ae ae 
68 — — — = 0 == — os 
70 — — — — 0 20 24 Ae 
72 — — — 25 0 daz us £u 
74 — — — == 1 an aS - 
76 — — — — — — — -— 
78 — — -— — — — — — 
80 — — — — — — — — 
If 462 421* 118 390 282 369 353 381 
No. of expts. 4 12 9 19 16 19 18 16 
No. of families 51 51 59 93 49 43 53 66 
measured 
ГА 21:05 20-48 24:90 27:95 22:90 20:38 25-72 25-64 
Ha 32-94 43-98 46-37 75:43 119-76 60-40 89-64 83-99 
РД 0-79 0-72 0-67 0-57 0-87 0-18 0-45 0-66 
Ya 1-78 0-84 2-07 1-25 1:60 — 0:46 0-70 1:12 


* One organism failed to grow and suffered lysis. 


E. О. Роже, 27 


опе or (rarely) more families of very unequal generation times. Thus for example with Bact. 


coli anaerogenes in an experiment involving four families each of six organisms, the mean 
generation times were 
263, 21-7, 28-3, 27-0; 


the variances 51:8, 2-2, 56-6, 135-5; 
and the squared coefficients of variation 
0-0748, 0-00467, 0-0707, 0-186. 


Irregularities of this kind, which were found in all the species examined, appear to make it 
doubtful that any simple meaning can be attached to the coefficient of variation of larger 
samples. 

In order to test the homogeneity of within-family variance, the Bartlett criterion, often 
denoted by M (which for normal samples is distributed approximately as a y*), was com- 
puted for each experiment in the usual way and the sum, EM, formed for each group of 
experiments. А correction for kurtosis was applied, given by Box (1953) as 


УМ' = z[u | (ie tin). 


This expression was simplified by taking a common corrective factor (1 + }y,) for every 
family, so biasing M’ towards smaller values. 

Two assumptions were made in assigning ys: (i) If the underlying distribution is of Type 
ПІ, y, is 6/g. In this case the y, were calculated from the final estimates of Kendall's g given 
by curve fitting (Table 9). (ii) If the underlying distribution is Yule's, y, is very nearly 2-4. 
The results of the test given in Table 2 show that the apparent heterogeneity is not signi- 
ficant under Rahn's hypothesis, but is markedly so under Kendall's. To this extent it might 
be argued that Rahn’s hypothesis is to be preferred. But examination of the figures in detail 
shows that the large contributions to M’ come mainly from families containing members of 
unusually short generation time, whereas it is the long upper tail of Yule’s distribution which 
inflates the fourth moment; near the origin the ordinates are exceedingly small. There is, 
however, no sufficient reason here to suspect Kelly & Rahn's experiments of being ill- 
controlled. 

In order to assess the extent to which the overall variance of generation time is increased 
by inconstant experimental conditions, I have carried out conventional analyses of vari- 
ance on the means of families. From Table 3 it can be seen that in every case a significant 
part of the total variance is contributed by differences in the growth rate from experiment 
to experiment. If o, and с? are the true within and between experiment variances, the 
observed between-experiment mean square will be an estimate of 


(where K is the number of experiments, N the number of families, and n the number of 
families in an experiment). On this basis we derive the figures in the last column of Table 3 
as estimates of the experimental variance of the growth rate. Except for Pr. vulgaris, 
these estimates of o% are not great; nevertheless, estimates of the coefficient of variation 
from the raw data will be too high. 


28 Some features of the generation times of individual bacteria 


Table 2. Modified Bartlett test of homogeneity of variance of families 


(A) Kendall hypothesis: у, 
EM’ 

n 
P(=M’) 


(B) Rahn hypothesis: y, 
=M’ 


n 
P(XM') 


| 3 | Source of Sum of | ee Mean F-ratio € 
ariation |uares juare and P(F 
vi о sq freedom | sq ( (F)) o 
| 
Between expts. | 305-95 13 23-53 ) к 3 
| Within expte. 42932 | 37 11-60 |j 2084005) EM 
| | 
| 
Total 735-27 50 
Between expts. 314-16 1 28:56 Я 7 
Within expts. 389-34 39 9-98 | кечет i үш с 
F Ix |) 
Total 703-50 50 | 
| 
Between expts. 610-18 8 | 76-27 
Within expts. 1609-21 50 32-18 } а аа 
Total 2219.39 58 
Pr. vulgaris Between expts. 2037-45 18 113-19 3-58 17-23 
Within expts. 2274-49 72 31-59 } (<0-0001) ; 
Total 4311-87 90 


E. O. POWELL 29 


By contrast, the coefficients of variation of generation times of families, with the excep- 
tion of Pr. vulgaris, are not significantly perturbed. The coefficients themselves (and their 
squares) are very unsymmetrically distributed, but the distribution of their logarithms, 
as shown by a graphical test, is roughly normal, and analysis of variance carried out on the 
logarithms yields the results of Table 4. (For convenience the variable was taken as 
100105,0, to the nearest integer; the constant factors of course cancel in the F-ratio.) 
Strep. faecalis again had to be omitted from this test, because many families (each consisting 
of a pair of sisters only) had zero variance. However, the test was also carried out on the 
squared coefficients of variation themselves; the sense of the results was exactly the same, 
and Strep. faecalis gave an F-ratio probability above 0-1, 


Table 4. Analysis of variance of family coefficients of variation 


2-94 (0-001) 


3008 


Thus the coefficient of variation is relatively stable as between experiments, but its large 
interfamily variance is difficult to account for on Kendall's hypothesis if, as is supposed, 
the generation time of an organism is independent of its immediate ancestry. It may well be 
a product of ‘delayed fission’; that is, the lapse of an appreciable interval between the divi- 
sion of nucleus or cytoplasm, and the separation of cell wall. А. single large delay in a family 
postpones the termination of one organism and the inception of two others, and so may have 
a disproportionate effect on the coefficient of variation. To admit this possibility is to 
entertain the view that the measurements may not be directly relevant to either hypothesis. 

The experiments with Pr. vulgaris were evidently less successful than the rest as regards 
reproducibility of growth rate; this is perhaps to be associated with the organisms' poten- 
tially more complex behaviour on the static medium used. 


30 Some features of the generation times of individual bacteria 


Correlation between generation times 
Kelly & Rahn (1932) suggested that some of the more extreme values occurring among 
their measurements were to be accounted for by ‘delayed division’. Such delay would ob- 


Table 5. Correlation between generation times of mothers and daughters (рур) and sisters (Pss). 
Number of pairs of observations, p. Parentheses indicate that the coefficient is not signi- 
ficantly different from zero at the 5 % level 


Pup corr. Ёвв corr. 
for bias Pss 2 for bias 
Bact. aerogenes (0-069) 360 — 077 231 — 
Bact. coli anaerogenes (—0-051) 318 — 0-48 210 — 
Strep. faecalis | — — — 0-68 59 — 
Pr. vulgaris — — — 0-74 195 — 
B. mycoides — 0:18 192 — 0:42 0-42 103 0-32 
B. subtilis, series (i) (0-090) 283 (0-016) 0-76 156 0-80 
В. subtilis, series (ii) — 0-37 252 — 0:34 0-65 154 0-59 
В. subtilis, series (iii) (—0-005) 248 — 0:16 0-60 169 0-36 
кыгынын Жы OE کک‎ ES ee) 


For Bact, aerogenes and Bact. coli anaerogenes ру ү is not significantly different from Zero, 
and the immediate conclusion is that dela 


E. О. POWELL 81 


Pss and рур in favour of positive values; ру may be appreciably negative, and pss smaller 
than it appears to be. However, inspection of the records leaves no doubt as to the great 
similarity between sister organisms. 

In the Bacillus group, рур is either zero or negative, but pg, remains positive to a high 
degree of significance, The septate mode of fission appears to be a more highly organized 
process than the isthmoid mode, and it probably gives rise to very appreciable delay. Its 
contribution to руу) cannot be assessed without knowledge of its variance, though selected 
examples suggest that the variance may be extremely large, especially in B. mycoides. 
But the differences between the three series of B. subtilis indicate that р ууу) is far from being 
a simple property of the organism. The growth rate in series (ii) and (iii), as will be shown 
below, was not constant; this should introduce a positive bias, yet руу, for (ii) is the most 
negative of all. 


(a) (5) 


Fig. 3. Abnormal development of Strep. faecalis. (a) Suecessive appearance of cells. 
(b) Corresponding family tree; generation times in minutes. 


The estimates of pgs and рур are affected by the presence of bias in the frequencies. I have 
attempted a correction in the following way. 

(i) Ҥ М and D are the generation times of mother and daughter, and R(M,D) the 
observed frequency of the two together, an estimate of the true frequency is 


R(M, D)|o(M + D), 


where « is the biasing factor of equation (2). 

(ii) If S,, S, are the generation times of a pair of sisters, and R(8,, S2) the frequency of 
the pair, the corrected value is R(S,, S,)/a{max (5, S,)}. The p calculated from frequencies 
corrected in this way are also shown in Table 5. No new regularity appears, and the sense of 
the foregoing remarks is not altered. 

Once delayed fission is admitted as a possible contribution to the total variance of genera- 
tion time, it becomes a matter of importance to assess its variance, since it may turn out 
so large as to render nugatory any attempt on the present lines to test the hypotheses of 
Kendall and Rahn. A single overt example of delay was found among the observations оп 
Strep. faecalis. Fig. 3a shows the appearances; a single cell, instead of dividing when it had 
reached the usual size, continued to grow until its length was about three times its diameter. 


Е? Some features of the generation times of individual bacteria 
It then split into a sphere and short rod, and fission of the rod followed in 6min. Fig. 35 is 
the family tree, with the times of fission placed so as to correspond horizontally with 


Fig. 3a. This example is sufficient to engender suspicion that minor degrees of delay may be 
much more frequent. 


T'he generation time distributions 
(а) B. mycoides and B. subtilis 


Although no quantitative conclusions relevant to the hypotheses of Kendall and Rahn 
can be drawn from the data on these species, the Observations have an important bearing 
on the theory of fission. The results of fitting the modified Type III and Yule distributions 


Table 6. Frequency function parameters and goodness of fit 


B.subtilis | B. subtilis B. subtilis 


Frequency function B. mycoides series (i) series (ii) series (iii) 
Г 
Pearson Type Ш with 

allowance for bias (eq. (7)): g 4:07 6:11 7-01 7:47 
т 7-04 3-81 4-21 3-90 
Pas 19:2 57-3 17.8 15:4 
n 17 15 16 16 
P(x?) 0-32 0-000 0:34 0:50 


Yule, with allowance for bias 


(eq. (8)): 9 4-64 9-48 = - 
m 13:3 8-18 а == 
x? 22-2 74-2 pa = 


n 18 14 
P(x?) 0-22 0-000 


(equations (7) and (8)) are given in Table 6 (see also Fig. 6). Series (i) of B. subtilis relates 
to observations begun at 2] hr. from inoculation, series (ii) at 4] hr., series (iii) to observa- 


Е. О. Power 33 


generation time and length at inception or termination is not exact. The first few vegetative 
organisms to grow from the spore are often very long—upwards of 30—and although the 
mean length later diminishes, the change is not so well marked as in say Bact. aerogenes at 
the beginning of the logarithmic growth phase. В. mycoides cannot be said to possess a 
normal mean length; the organisms become progressively shorter over a period which may 
be as long as 24 hr. in some media. While a diminution in mean length is occurring, the num- 
ber growth rate must be greater than the mass growth rate, and since it is initially less, 
there is acceleration at some stage. I have therefore checked the uniformity of growth 
during measurements of generation time in the following way: 

It will be recalled that all the experiments grouped into any one column of Table 1 were 
begun at thesame epoch in the life of the culture, and each occupied 80 min. Number growth 


8 


Total number of organisms (logarithmic scale) 


«е 20 40 60 80 


Time from beginning of experiments (minutes) Fig. 5. B. mycoides growing on cellophane, 


ў coides 5 hr. from inoculation. A single cell has 
Fd. оа ДБА è split off from one end of a long organism. 


curves were constructed by adding together the number of organisms under observation 
at corresponding times in each experiment. For B. mycoides and B. subtilis series (i) the 
logarithm of this sum was a linear function of time (Fig. 4), but for B. subtilis series (ii) and 
(iii), there was marked curvature especially over the earlier part of the period. Numerical 
evaluation of the age distribution by means of equation (4) was also carried out, and 
compared with the observed age distribution at the 80 min. epoch; series (ii) and (iii) showed 
an obvious excess of young organisms, as would be expected from an increasing growth rate. 
These two series are therefore of no quantitative value. г | | 

The extraordinarily wide range of generation times in B. mycoides (Table 1) is a con- 
comitant of its readiness to divide at any available point (i.e. between any pair of cells). 
The exaggerated appearance of Fig. 5 is by no means a rarity; the frequency with which 
a single cell is split off from an organism of eight or more cells is diagnostic of the се 
Such single-celled organisms are viable, but develop rather slowly, and some do not divide 
within 3 hr. of their inception; they are not spores (cf. Bergersen, 1954). В. subtilis shows 


Biom. 42 
3 


34 Some features of the generation times of individual bacteria 


something of the same freedom in its earlier stages of growth, but it reverts more quickly 
to regularity. Clearly, in these species the generation time is intimately connected with the 
coarser structure of the organism, and only remotely with the nuclear fission. 

The distribution of B. subtilis at 2}-4 hr. (series (i)) is notable for its approximate sym- | 
metry (Table 1), in which respect it is quite outstanding; neither the Type III nor Rahn’s 
distribution gives an acceptable fit. 

The experiment of growing B. subtilis over blood was an attempt to test the constancy of 
the coefficient of variation of generation time. Pearce and I had found (1951) that in these 
circumstances the organisms were very short and uniform in size, which suggested that the 
dispersion would also be small. The blood used in the present work was diluted with saline 
in order to reduce its viscosity, and over this mixture long organisms were first formed as 
usual. The mean length gradually diminished, but at the intermediate stages both long and 


40 
30 
[ 30 
20 f 
/ 20 
; | 
В. i 
10 | subtilis 
| 10 
1 || 
4 
0 20 40 60 80 0 20 40 60 80 
T (minutes) T (minutes) 


Fig. 6. Generation time distributions of В. mycoides and В. subtilis, 
with modified Pearson Type III frequency functions. 


short organisms were present, each clone being predominantly of one sort or the other. As 
I have shown, the growth rate increased considerably during the course of observation. 
Despite the imperfection of the experimental conditions, the coefficient of variation (0:36) 
was rather less than in the other two series (0-40 and 0-37). 

These rather disconnected remarks indicate that the behaviour of B. mycoides and 
B. subtilis is so complex that the experiments are quite insufficient to expose it fully. The 
distributions, however, possess one regular feature worth examination. 

Fig. 6 shows that the frequency of small generation times in B. mycoides and B. subtilis 
series (i) is greater than the expectation accorded by the assumed frequency function (a 
modified Type III). This is true also for В. subtilis series (ii) and (iii) (and has since been 
observed in B. megatherium). The observed frequencies near the origin are uniformly so 
large as to suggest that the true frequency function approaches zero with a non-zero slope 
(I have pointed out that the observations are less liable to error in this region than else- 
where). In no case did two successive fissions oceur in the same 2 min. interval. It will be 
seen from Fig. 6that no Pearson Type Ш or Yule function could give a tolerable fit over the 


E. О. POWELL 35 
higher range without being in defect at low values of т. Both of these functions have a 


finite positive and non-zero slope at the origin only when g = 2 and then 


т? т? 


2 ras and 2 respéctively. 


But the distributions with g = 2 have values of у, and у, quite different from those ob- 
served. Kelly & Rahn'sresults for B. cereus are equally suggestive; their grouped frequencies, 
in order of increasing mean т, аге 9, 2,4, 5, 17, .... 

Among the unicellular organisms, one instance (Strep. faecalis, т = 6 min.) has already 
been noted of a generation time which was unusually short because of delayed fission in the 
previous generation. It is perhaps significant that this strain, like the Bacilli, exhibited the 
septate mode of fission. But it cannot be said of B. mycoides that the shortest generation 
times are associated with anomalous development, for reasons already given. Many 
instances of short generation time in B. subtilis can, however, be accounted for as resulting 
from an unusual succession of fissions. Fig. 7 а represents the usual succession; an organism 
about to divide has typically three completed party walls (represented by pairs of dots). 
At a time f it divides at or near its centre, and the two daughters similarly at t, and t». For 
simplicity, the organism is shown as dividing without growing. The intervals t, — 4, and 


n em 


to 


(DRI ENE IBE) 
to 
CICERO "i 
IL Us ie, 
Shel 
(b) 


(а) 
Fig. 7. Normal (а) and abnormal (5) succession of fissions in B. subtilis. 


1,5—1, are not generally extreme. Occasionally (Fig. 7b) the first division occurs at a dis- 
tance of about one-quarter of the length of the organism from one end, and a second division 
follows very shortly; then the interval Фу — t, is small, That half of the organism to the right 
of the arrow in Fig. 75 can be usefully regarded as an organism manqué with a negative 
generation time tọ — tı, i.e. the associated separation of cell walls has occurred in the opposite 
order to the corresponding separation of cytoplasm. 

I believe that the impression given by the generation time distributions of B. mycoides 
and B. subtilis, namely, that the slope does not vanish near the origin, is a just one, and that 
the phenomenon is of fundamental significance for the mechanism of fission in Bacillus; 
I hope to deal with it more fully in another communication. 


(b) Bact. aerogenes, Bact. coli anaerogenes, Strep. faecalis, Pr. vulgaris; Assessment of the 
genetic parameter 
The result of fitting the Type III and Yule distributions to the observations on unicellular 


organisms is shown in Table 7 A-D. 
3-2 


36 Some features of the generation times of individual bacteria 


Comparison of sections A and C shows at once that the Type Ш distribution fits the data 
better than does Yule’s, in each separate case. Moreover, the method of moments gives 
(section B) substantially the same results as the method of maximum likelihood. Applied 
to Yule’s distribution, it gives a rather better fit, with very different values of the para- 
meters (section D). This disagreement, with the moment fit better than the maximum 
likelihood, seems to indicate that Yule’s distribution is inappropriate to the raw data. 

However, the analysis of variance has shown that the crude coefficient of variation (c) 
of 7 must exceed the supposed true and constant value (co) because of the experimental 
dispersion of the mean, and so the method of moments (Table 7B and D) underestimates g 
—sampling fluctuations apart. As a further consequence, it does not follow from these results 
that Kendall’s hypothesis is to be preferred to Rahn’s. 

Kelly & Rahn’s results on Bact. aerogenes have been shown by Finney & Martin (1951) 
to be in satisfactory agreement with Rahn’s hypothesis so far as the X? test can distinguish. 
I have already pointed out that there is some doubt as to the correct assignment of mean 
generation time to the grouped frequencies, but with the same assignment as that adopted 
by Үау © Martin, viz. r-2) 7j 12) 17] 


f=001 7.6 


I find that the Type III curve fits less well than Rahn's and shows no advantage in its 
estimate of c and у, (Table 8). With other assignments of generation times to frequencies 


7-2) 5 10 15 
f=0 1 7 61 


and T=5 10 15 20 
f=0 1 7 61 


the estimated parameters are different, but the sense of the results is the same. 

Hypothesis requires that the genetic parameter (and therefore the true coefficient of 
variation of 7) shall remain constant despite variations in growth rate. Finney & Martin 
(1951) pointed out that the best estimate of it would be obtained by fitting frequency 
functions to the data for each experiment separately, constraining g to be the same for each, 
but allowing m to differ as required from experiment to experiment. As they realized, this 
would be an exceedingly laborious process with Rahn’s frequency function, but it turns 
out that for the Type III function a simple maximum-likelihood solution exists. 

It is convenient to write the Type III function in the form 

gor? e-üría 
а = TO) 

where the new parameter а із the mean of the distribution, equal to gm. Let a, be the value 
of a to be assigned for the rth experiment. Then the likelihood for that experiment is 


log L, = (0—1) X(flogr) — gXfr|a, — g(loga,) Ef + g(logg) Xf = (log Г(ф)} Ef, 
where the f are the observed frequencies of т in the rth experiment. 


On summing over all experiments and differentiating we obtain the desired maximum- 
likelihood estimate of g: 


mG) —log 9) = X (log) — X: (og) x), (9) 


T, 


E. О. Рок, 37 


Table 7. Frequency function parameters and goodness of fit 


Frequency function and 


method of fitting 
(A) Pearson Type III, 
maximum likelihood: g 13-4 
т 1:57 
а 16-1 
n 11 
Р(Х?) 0-14 
(B) Pearson Type ІШ, 
moments: 13:4 


(D) Yule, moments: g 
m 
x 
n 
P(x?) 
(E) Pearson Type VI, first 
two moments and oh: 9 
8 
b 
bd 
n 
P(x*) 


a ——— 


(Е) Yule-hyperbolic (eq. (14)), first 
two moments and св: g 


38 Some features of the generation times of individual bacteria 
where â, = Efr]f, 
and ^ is Y; (Xf), the total number of observations. The mixed derivatives 


e 
ёдда, 
of X log L, are all zero (this is the virtue of using the parameter a instead of m), and so, as 


d са 
an a, 0a, (г +8) 
in the simple сазе, the asymptotic variance of 9 can be estimated as 


varê = I/n(T 9) — 1/9}. 


Table 8. Comparison of Pearson Type III and Yule distributions 
applied to Kelly d: Rahn’s results for Bact. aerogenes 


c Yı 
0-30 0-61 
0:33 1:18 
0-31 0-91 


The results of applying equation (9) to the data for the unicellular organisms are shown in 
Table 9 E. The g values obviously differ significantly from species to species, and all are 
less than 20, the approximate figure arrived at by Kendall (1948) from a consideration of. 
Kelly & Rahn's experiments on Bact. aerogenes. In spite of the fact that the variance of 7 
in the present measurements on that organism is less than in theirs, the value of. g is still 
only 16-5; Kendall, of course, was not aware that the apparent heterogeneity of the data was 
largely intrinsic. 

Although the preceding form of treatment is not available for Rahn's hypothesis, à 
satisfactory approach can in any case be made through the coefficient of variation. 

А number of generation time experiments carried out on one species can be considered 
to provide a sample of frequency functions A(r, а, бо) drawn from a population in which the 
true coefficient of variation ĉo = ea is by hypothesis constant, but the mean a is dispersed. 
In this hypothetical population of h-functions, let J(a) be the frequency function of the mean. 
Then considering the whole series of experiments together, the expectation of an observa- 
tion 7 will be os 

dF -Í h(r,a, co) j(a) аат, (10) 
[7] 


where а„ and a, are the limits within which a must lie. The moments about zero are then 
«€ (du 
ГГА -f | T'h(r,a, Cy) j(a) dadr. 
0 Ja 
1t can safely be assumed that the order of integration may be inverted, so that 


= Ж P J(a) r'h(r, a, c) dr da. 
a 


E. О. POWELL 39 


I icular, i р 
Mae 4 = | aia) da =a, say. 


au 
m= [joe 1)da = (c+ KG), say. 
Hence the observed crude coefficient of variation, c, is related to cy by 


e = usui — 1 (8+1) ug )jat- 1. 


(Evidently c > со always, since (7) > а?.) Or, more symmetrically, writing c; for the coeffi- 
cient of variation of the j-distribution, 


c* 1 = (eg 1) (cj - 1). (11) 


Thus from a crude coefficient c a corrected value c, can be calculated if the experimental 
variance of the mean is known, always under the hypothesis that c, is constant. The neces- 
sary figures are already available from the working of the analysis of variance; the o, of 
Table 3 are estimates of .(j) and F 

PAD) 23. oh = mdi 


Application of equation (11) then yields the figures in Table 9 B. 

It is to be noted that the relation between g and c, so calculated is determined by the 
h-distribution, and is independent of j(a). 

If from the corrected values c, new estimates of Kendall’s g (= 1/cډ)‎ are calculated, the 
results agree well with the revised maximum likelihood estimates (Table 9C and E), and so 
the assumption that the h-distribution is of Type III is not an unreasonable one. 

Tt is now possible to derive also improved values of Rahn's g from 


ga DODDO s 
(Tig - 1) - PO)? 


"These values are given in Table 9 D. 

Equation (10) suggests a method of developing modified frequency distributions capable 
of representing the crude data more accurately, and of being subjected to a meaningful 
significance test. Each series of experiments furnishes a number of values a, and a variance 
63, The exact distribution of a is not known, but a graphical examination of the individual 
a, shows a marked modal tendency. Suppose then, for convenience, that a is distributed 


as а Pearson Type V variate: parigita 
j(a) = н 
with overall 7 = a = b[(s— 1), 
and Th = pj) = 0|(8— 1? (s— 2). 
Then if the distribution (т, а, со) is of Type III, equation (10) is easily integrated, and 
becomes d lg т m 


Big, s) (rx bjgy i 


where B(g,8) is the complete beta-function. This is a Pearson Type VI distribution. 
I have fitted equation (12) to the raw data by using the first two moments and the 
variance сЗ, of the mean, to determine б, з and g (Table 7 E; the values of g are of course the 


40 Some features of the generation times of individual bacteria 


same as those in Table 9C, since the j-distribution is irrelevant). To neglect с; and use the 
first three moments would omit part of the information available, viz. the knowledge of 
how the 7 were grouped by experiments. The goodness of fit is no better than that of the 
simple Type III function. One of the fitted curves—Pr. vulgaris, for which o% is largest — 
is shown in Fig. 8 (curve (i)). 


Table 9. The crude (c) and corrected (со) coefficients of variation, 
with final estimates of the genetic parameter (д) 


Strep. Pr. 

faecalis vulgaris 
(A) с 0-273 0-324 0-273 0-319 
(B) co 0-257 0-306 0-251 0-277 
(C) g (Kendall) from c, 15-1 10-7 15-9 13-1 
(D) д (Rahn) from e, 79 36 92 55 
(E) g (Kendall) by maximum 16:5 + 1:1 10:9 + 0-7 18:6 + 2-3 14:4 + 1:0 

likelihood 


L PONE. 


Another convenient choice of J(a) is to take it as rectangular with mean a and range 
2 43 a y. Then, still with a Type ПІ distribution for h(r, a, Cy), equation (10) becomes 


= Reg ат уве,'?—1)-(—о-1)\ъ as 


where I(x, p) is the incomplete gamma function 


та =ч 
raj. dris 


This distribution, fitted to the Pr. vulgaris data, is also shown in Fig. 8 (ii). The y? is 29-5, 
as against 30-1 for the Type VI, and the close geometrical similarity of the two curves 
shows that it matters little what functional form is chosen for j(a). A fortiori this is true for 
the other organisms, for which с, is much smaller. 

Yule’s distribution does not combine readily, in equation (10), with any of the standard 
distributions. However, after the result of the previous paragraph, we may take 


" À 
ja) = @ (,>a>1,), 


i.e. j(@) is a segment of a rectangular hyperbola. Then 
А = 1/log (l/l), 
a = A(l,—1,), 
oh +a* = uj) = A(R- B). 
Equation (10) becomes 


L 
dF = м £71 Y la ü- e-7¥/a)o-1 dadr, 
һ ы 


E. O. POWELL 41 


where Y = l'(g- 1) - (1); by equation (5), a = mY, and by hypothesis ¥ is constant. 
Integration is immediate on writing z = e7 Y”: 


A 
Р = 7 [(1— eth- (1— e77*^jr) dr. (14) 


Like the Type VI, this distribution fits no better than the parent Yule distribution with 
smaller g (Table 7 D and F). 


T (minutes) 


Fig. 8. Generation time distribution of Pr. vulgaris with fitted curves. 
(i) Pearson Type VI, (ii) distribution of equation (13). 


As in the Bacillus group, there is in all four species an excess of observed over expected 
frequencies near the origin (Table 10); this is true for all the distributions of Table 7. The 
organisms of very short generation time are well distributed among the several experiments 
carried out on each species; thus the ten Pr. vulgaris organisms of 7 < 10 (see Fig. 8) are 
distributed among five experiments—they are not associated with a single experiment of 
exceptionally low mean 7. And it is evident that if the restriction, cy = constant, is main- 
tained, only extravagant dispersion of the mean could appreciably increase the ordinates 
of the fitted curves near the origin. Qualitatively, this is just such an effect as might result 
from the occurrence of occasional large delays in fission; because of the skewness of the 
distribution the presence of a short-lived daughter organism would be much more obvious 
than that of its long-lived mother. 

On the score of goodness of fit, therefore, Kendall's hypothesis is to be preferred to Rahn's, 
but the proposed frequencyjfunetions do not represent the observations altogether satis- 
factorily. The large scatter of within-family variance (Table 2) is admissible only on Rahn’s 


42 Some features of the generation times of individual bacteria 


hypothesis, but it is associated with the presence of an excess of organisms of very short 
generation time, and cannot be solely accounted for by the long upper tail of the Yule 
distribution. 


Table 10. Observed and expected frequencies of short generation times 
(The range of 7 is, for uniformity, taken to be half the distance of the mode from the origin.) 


Range of 7 0-9 0-9 0-11 0-11 
Observed frequency 5 11 1 10 

Expected, Type III fitted by moments 17 6-8 0-6 3:4 
Expected, Yule fitted by moments 0-04 1-6 0-02 0:4 
Expected, Type VI 1-4 59 0-5 2-2 


Expected, Yule-hyperbolic (eq. (14)) 


AEA n 


CONCLUSIONS 


The values of the corrected coefficients of variation in Table 9, and the corresponding 
estimates of the ‘genetic’ parameter g constitute the principal quantitative outcome of 
this study. Other features of the pattern of generation times, though less precisely expres- 
sible, are not less important. 

Kendall’s g is nowhere as much as 20, and obviously cannot be identified with the number 
of genes in the organism, Kendall himself (1952a) does not in fact insist on any particular 
interpretation. Rahn's g ranges up to about 100, which is much larger than the estimate of 
Finney & Martin (25 for Bact. aerogenes); however, private opinion among geneticists 
appears to indicate that this number is still absurdly low, and their view certainly seems 
justified by the wide range of genetically determined properties already known in Bact. coli 
and Neurospora crassa for example. 

The observations on B. mycoides and B. subtilis, though possessing a specialized interest 
of their own, are not at present susceptible of any simple interpretation, however tentative. 

It is very likely that a mechanism of Kendall's type, i.e. a stepwise process, does occur 
during the fission of an organism, and there is no a priori reason why it should not also be 
preceded or accompanied by Rahn’s gene-duplication process. Then the one will be mani- 
fested clearly only if it is slow enough, relative to the other, to be the dominant factor in 
determining generation time. If neither is dominant (and this is a real possibility) the 
experimental picture will be confused, though it may not be recognized to be во. As things 
stand, Kendall’s hypothesis is to be preferred, but only on the ground that its formal 
expression as a Pearson Type ITI distribution is in fair accord with the data. Kendall and 
Waugh (Kendall, 19522) have also proposed a modification of Kendall’s original hypothesis 
by relaxing the condition that the duration of the primitive steps of his fission process shall 
have the same frequency function for all. The modified hypothesis subsumes Rahn's as 
a special case (though only formally, not structurally). I have not examined its practical 
consequences in detail, but it has one significant feature pointed out by Kendall: suppose 
g is estimated by fitting a Type III distribution to a set of data, then if the primitive steps 
are not all distributed alike, their number will be greater than g. 


E. О. POWELL 43 


It would obviously be possible to test Rahn's hypothesis by measuring generation times 
of the same species at different temperatures and when growing on different media; the 
coefficient of variation should remain constant. Under Kendall’s more reserved hypothesis, 
change of temperature would be expected to make no difference, but his g might well depend 
on the number of synthetic processes demanded by the laying down of nuclear matter or 
cell wall, for instance, and so might depend in turn on the chemical complexity of the 
nutrients offered. Further, a similar study of a wide range of organisms, including auto- 
trophs and the most exigent pathogens, would show whether or not there is any correlation 
between the dispersion of generation time and richness of capacity and structure. Rahn 
(1932), reaching to the mammals for an extreme comparison, suggests that there is such a 
correlation. 

But I do not think that studies of this kind would be at all profitable at present. There 
are serious objections not only to the acceptance of either hypothesis, but also to the 
acceptance of the data as critical of either: defects in the frequency functions, correlation 
between generation times of sister cells, heterogeneity of variance. The hypotheses, at least 
in their primary intention, relate to nuclear processes whose effect is seen only at two or 
three removes. At every point of difficulty in the foregoing discussions the possibility 
naturally suggests itself that delayed division plays an appreciable part in the dispersion 
of generation time; that the recognizable termination of the cell sueceeds the essential 
determinative process by an interval which is itself sensibly dispersed. Therefore, the 
immediate need is for improvement in technique. The use of ultraviolet illumination should 
enable nuclear fission to be seen directly, and so permit the generation time so-called to be 
analysed into its components. Extended study of cultures on cellophane over а flowing 
medium would also be valuable; when constant growth rate can be reliably maintained over 
many generations, it will be possible to calculate the dispersion of generation time simply 
from the dispersion of clone size, with great economy in time and patience (Kendall, 1948). 
Further mathematical work is also in progress (see Kendall, 19526). 


SUMMARY 

1. The conditions are discussed which must be met in any attempt to measure a frequency 
distribution of generation times of micro-organisms. 

2. Generation times of individual organisms of six species have been measured: Bacterium 
aerogenes, Bact. coli anaerogenes, Streptococcus faecalis, Proteus vulgaris, Bacillus subtilis, 
В. mycoides. 

3. The hypotheses developed by Kendall and by Rahn each connect dispersion of 
generation time with a postulated mechanism of fission; each implies a definite mathematical 
form for the distribution. Comparison with experiment suggests that Kendall’s is to be 


preferred. 
4. In Rahn’s hypothesis, one of the parameters of the distribution is identified with the 


number of genes in the organism. When every allowance is made for experimental error, 
and in the most favourable case (Strep. faecalis), the estimated number of genes is less 
than 100, 

5. Generation times are not distributed at random, but are in part determined by a weak 
and variable hereditary factor effective only over a few generations. 

6. There are positive objections to the acceptance of either Kendall’s or Rahn’s hypo- 
thesis: correlation between generation times of sister cells, heterogeneity of variance in 


44 Some features of the generation times of individual bacteria 


small families of organisms, certain defects in the proposed distribution functions. These 
difficulties, and the effect mentioned under (5), may be due to delayed fission, that is, to the 
lapse of an appreciable period between the division of the nucleus and the observed separa- 
tion of the cell into two parts. 

7. In the Bacillus group, the measurements are not relevant to the hypotheses, because 
the organisms are multicellular. 

8. Improved techniques are required for the further pursuit of the subject. 


T. W. Pearce and Jean M. Scott have shared with me the labour of observation. Much 
of the arithmetic was carried out by R. Ash. I am indebted to Dr D. W. Henderson for his 
confidence and encouragement and to S. Peto for his helpful criticism. In the analysis of 
data I have benefited greatly from Prof. E. S. Pearson’s guidance. Publication is by 
permission of the Chief Scientist, Ministry of Supply. 


REFERENCES 


Barrier, M. S. (1937). Properties of sufficiency and statistical tests, Proc. Roy. Soc. A, 160, 268-82. 

DE Bary, Н. A. (1884). Vergleichende Morphologie und Biologie der Pilze, Mycetozoen, und Bakterien. 
Leipzig. 

BERGERSEN, Е. J. (1954). The filamentous forms of Bacillus megaterium. J. Gen. Microbiol. 11, 175-9. 

Віѕѕет, К. A. (1950). The Cytology and Life-history of Bacteria. Edinburgh: E. and 8. Livingstone. 

Box, G. E. P. (1953), Non-normality and tests on variances. Biometrika, 40, 318-35. 

DaxrEzs, M. E. (1952). The covering circle of a sample from a circular normal distribution. Biometrika, 
39, 137-43. 

Dawson, I. M. & SrERN, Н. (1954). Changes in the bacterial cell wall during cell division as seen in 
the electron microscope. J. Gen. Microbiol. 10, iii. 

Finney, D. J. & MARTIN, L. (1951). A re-examination of Rahn’s data on the number of genes in 
bacteria. Biometrics, 7, 133-44. 

FISHER, R. A. (1922). On the mathematical foundation of theoretical statistics. Philos. Trans. A, 222, 
309-68. 
Harris, Nora K. & PowELL, E. О. (1951). A culture chamber for the microscopical study of living 
bacteria, with some observations on the spore-bearing aerobes. J, R. Micr. Soc. 71, 407-20. 
Harris, Т. Е. (1951). Some mathematical models for branching processes. Proc. Second Berkeley 
Symposium on Mathematical Statistics and Probability, p. 305. 

Kerry, C. D. & Ram, О. (1932). The growth rate of individual bacterial cells. J. Bact. 23, 147-53. 

KENDALL, D. G. (1948). On the role of а variable generation time in the development of a stochastic 
birth process. Biometrika, 35, 316-30. 

KENDALL, D. С. (1952a). On the choice of a mathematical model to represent normal bacterial 
growth. J. R. Statist. Soc. B, 14, 41-4. 

xr D. G. (19526). Les processus stochastiques de croissance en biologie. Ann. Inst. Poincaré, 

, 43-108. 

LOMINSKT, I. & LeNpRUM, A. С, (1947). The mechanism of swarming of Proteus. J. Path. Bact. 59, 
688-91. 

Onskov, J. (1922). Method for the isolation of bacteria in pure culture from single cells and procedure 
for the direct tracing of bacterial growth on a solid medium. J. Bact, 7, 537-49. 

PrancE, Т. W. & Powzrr, E. О. (1951). New techniques for the study of growing micro-organisms. 
J. Gen. Microbiol. 5, 91-103. 

RAHN, О. (1932). A chemical explanation of the variability of the growth rate, J. Gen. Physiol. 15, 
257-77. 

SCHAUDINN, Е. (1902). Beitrüge zur Kenntnis der Bakterien und verwandter Organismen. I. Bacillus 
bütschlii, n.sp. Arch. Protistenk. 1, 306-43. 

Pa d . & Еуки, R. B. (1942). A six-place table of Einstein funetions. J. Phys. Chem. 46, 

41-61. 


Үл, G. U. (1925). A mathematical theory of evolution, based on the conclusions of Dr J. C. Willis, 
F.R.S. Philos. Trans. B, 213, 21-87. 


[ 45 | 


QUANTUM HYPOTHESES 


Bv 8. В. BROADBENT 
The British Coal Utilisation Research Association 
1. INTRODUCTION 


(1-1) Statistical problems which involve a mixed population are complex, even when the 
component subpopulations can be assumed to be normal. Karl Pearson (1894) noted in his 
dissection of a frequency curve into two normal components, ‘It may happen that we have 
a mixture of 2,3, ..., n homogeneous groups, each of which deviates about its own mean 
symmetrically, and in a manner represented with sufficient accuracy by the normal curve.... 
The equations for the dissection of a frequency curve into n normal curves can be written 
down as for the special case of n = 2 treated in this paper; they require us only to calculate 
higher moments. But the analytical difficulties, even for the case of n = 2, are so formidable, 
that it may be questioned whether the general theory could ever be applied in practice to 
any numerical case.” 

The present paper discusses the simpler problem in which the means of the components 
are equally spaced. The hypothesis that this is the case has been called by Hammersley 
& Morton (1954) a quantum hypothesis, for the means are then a constant plus multiples of 
a basic quantity or quantum. 

Examples of such distributions will be found in the paper and in the references cited. 
It is worth noting that such rules as Brook’s law (1886), formulated by Fowler (1909) in 
his study of the Ostracoda, that ‘during early growth, each stage increases at each moult 
by a fixed percentage of its length, which is approximately constant for its species and sex’, 
and Przibram's rule (1912) discussed by Wigglesworth (1942), that “the weight doubles 
at each instar, and at each moult all linear dimensions are multiplied by 4 2’, take the form 
of quantum hypotheses when the logarithms of the weights or lengths are taken. Therefore 
situations in which a quantum hypothesis applies after the variate is transformed are here 


considered. 

(1:2) Data suggest a quantum hypothesis by the occurrence of regularly spaced modes. 
From such data three results are commonly required: 

(i) an estimate of the quantum which determines the spacing of the modes, 
(ii) an estimate of the scatter within the subdistributions, and 

(iii) a demonstration that the quantum rule is not disobeyed, i.e. that the data are not 
more likely to come from some other distribution, perhaps unimodal. 

When the valleys between the modes are very well marked, there is little difficulty in 
meeting these requirements. Controversy about the interpretation of some data shows that 
a statistical treatment is required for those cases in which grouping at the modes is less 
obvious. 

In the most difficult situation we are presented with data alleged to support a quantum 
hypothesis. Noindependent information is available; the hypothesis has arisen from a study 
of the data and the means are estimated from the observations. We are required to test 
whether the observations are genuinely grouped about these means. For examples of this 
type, вве Hammersley & Morton’s (1954) Druid Circle problem and Grant’s (1952) measure- 


46 Quantum hypotheses 


ments of the energy levels of atomic nuclei. Hammersley & Morton conjecture that in this 
form the problem is beyond present-day analytic resolution, and offer a Monte Carlo method 
of testing the supposed grouping. In this case the difficulty is that the data which prompted 
the hypothesis are also used for estimation and for testing, and it is not clear what allow- 
ance can be made in a test for such previous use. The warning given by Pearson & Chandra- 
sekar (1936) is relevant: “То base the choice of the test of a statistical hypothesis upon an 
inspection of the observations is a dangerous practice; a study of the configuration of the 
sample is almost certain to reveal some feature, or features, which are exceptional if the 
hypothesis is true....By choosing the feature most unfavourable to the hypothesis out of 
a very large number of features examined, it will usually be possible to find some reason 
for rejecting the hypothesis.’ We need add only that it will be equally possible to choose 
some untrue hypothesis and to find some test which favours it. 


$ ——— 8 


Distribution of y 


Sub-distribution є 


Fig. 1. Frequency curve on the quantum hypothesis. 


(1-3) The population we shall consider will be compounded of normal subdistributions 
with means equally spaced at #+ 28 (r = 0,1, =). The probability that a randomly 
chosen member of the population is from the tth subdistribution, i.e. has expectation 


1 т 
f+ 248, is ру; not all the P, are supposed non-zero, but 3} P, = 1. The frequency function of 
=0 


such a population is shown in Fig. 1. The subdistributions may be homoscedastic, with 
common variance g?, or the s.p. may increase linearly with the mean. 


(1:4) The frequency histogram of n observations from such a population will usually 
consist, of regularly spaced peaks if o/d is small and the Pı are not too different. As cà 
increases the valleys between the peaks disappear and it becomes difficult to assign an 
observation to its correct subdistribution. In these circumstances the determine the 
overall appearance of the population, i.e. whether it is unimodal, symmetrical and go on. 
Itis clear that using this model and by suitable choice of the parameters we can approximate 
with any required precision to any distribution, and that the graduation between multi- 
modal and unimodal distributions is continuous. 

Just as the genuine quantum model merges into the unimodal model, so it is possible to 
find a quantum which fits data from any distribution, since experimental data are usually 
rational. We have only to take the Н.С.Е. of the observations, after adding an arbitrary 
constant, to obtain a quantum which fits the data precisely. Other constants are possible 
which fit the data less exactly, and even if the observations were not rational, a quantum 
can be found to fit the data as closely as we please. Equally, when a quantum fits the data 
we have only Occam’s principle to exclude the possibility that the true quantum is a half 


S. К. BROADBENT 47 
or a third of the one we have accepted. The difficulties of estimation in a similar situation 
have been discussed by Hammersley (1950). 

Obviously when so many alternatives are so easily produced we must be careful to state 
explicitly what assumptions are being made and where independent evidence is being used. 


(1:5) In this paper the problems considered will be limited to the following: 

(i) Estimation of the positions of the modes when the observations have been allotted 
to the correct subdistributions, and these subdistributions are normal. 

(ii) Estimation of the variance of each component when the subdistributions are normal 
and either the observations have been allotted to the correct subdistributions (positions 
of the modes unknown) or the positions of the modes are known (observations not allotted 
to subdistributions). 

(ii) Testing whether to accept a hypothesis which specifies the positions of the modes 
and which is independent of the data used in the test. The normality of the subdistributions 
is not here assumed, the p, are not specified, nor аге the observations allotted to subdis- 
tributions. The hypothesis against which the quantum hypothesis is compared is one of 
a class described in § (4-2). 


2. ESTIMATION: MODES 

(2-1) In this section it is assumed that the data can be allotted to the correct subdistribu- 
tions, i.e. that when we are given any observation we can say it comes from the tth sub- 
distribution, although we do not know precisely the location of this subdistribution. This 
may be possible because the data fall into clearly defined groups with every component 
known to be represented or for some other reason. For example, in the measurement of 
insects during moulting, each group or subdistribution is composed of measurements 
after a defined number of moults. 

Let y, be the sth observation in the rth subdivision, with mean / + 2rd. Then 


Yrs = B+ 2r +E, 
where дапа ô are unknown constants (28 is the quantum), r is zero or a positive integer, and 
¢,, the normal error, or deviation of y,, from its mean, with mean zero. Let r = 0,1, ...,m; 
m nr 
821,2,...,2,; Xj n, — ^h; 3 Yoo = Y- 
r=0 8=1 


(2:2) The method that has frequently been used to detect the quantum situation and to 
estimate the positions of the modes is to plot the means of the successive groups (or the 
experimental modes; these may also be used when observations cannot be allotted to groups) 
against r, and to draw a straight line through the points so obtained. A similar but efficient 
estimation of £ and 28 is by the regression of y,, on r. The method is equivalent to least- 
squares and to maximum-likelihood procedures for estimating the means of the subdistri- 
butions subject to the restrictions that they are in arithmetie progression. The results of 
the regression analysis are summarized in the three sections below. 


(2-3) є,„ has S.D. с (all т). The equations giving estimates b and 2d for Û and 20 are 
b = (Хп, EY, — Ern, Ur,)/A, 
2d = (nXrY, — Ern, EY)/A, 


where A = n Er?n, — (Ern,)?, and all summations are over r = 0, 1, ...,7. 


48 Quantum hypotheses 


The properties deducible from regression coefficients apply to 6 (the intercept of the 
regression line with the ordinate) and 2d (the slope). We have: 
(i) the variance с? of each y,, about its mean is estimated by sî = St/(n — 2), where 


Sî = {AA’— (n dry, — =m, EY)5/(nA), 
and А' = у»— (DF). 


(ii) Sł has c? times the А? distribution with (n — 2)d.f. 

(iii) the variance of 2d, estimator of the quantum 26, is itself estimated by sj/A. 

(iv) sî/A is independent of 24, so the t-test may be applied to hypotheses about 20. 

(v) The variance of b, estimator of Û, is itself estimated (not independently of the variance 


of 2d) by si (е, аА). 


(2-4) 2 = 0 given; €, has в.р. с (all r). The estimator for 28 is 
2d = D na r?n,). 
(i) the variance o? of each Yrs about its mean is now estimated by s3 = S$/(n — 1), where 
Si- z Уу 2: (> "үу rn,. 
(ii) S$ has c? times the X? distribution with (n — 1) df. 
(iii) the variance of 24 is itself estimated by s2/Y r2n,. 
(iv) sil: rn, is independent of 2d, so the t-test is applicable to hypotheses about 29. 


2d = У (0/0). 


This is the arithmetic mean of the homoscedastic ratios Yj[r, i.e. in the regression model 
each observed point gives an equally precise estimate of the slope. 

The variance о? is estimated by essentially the same method as that implied in (2-3) 
and (2-4). Consider the identity 


(Yrs x 2r6)/r = (Yrs aT 2rd)|r # (d zn 0). 


Of these quantities the first is distributed normally with mean zero and variance o*, the 
second does not need д in its caleulation, while the third has mean zero and variance on. 


Tt follows that 

MI ote Bl УС e X Ey, ара 4n(d — ду, 
and by Cochran's theorem that 

Si = У У (y,— 2rd)*/r2 
PR 

has c? times the y? distribution with (n — 1) d.f. 

(i) The variance o? is estimated by sj = Si/(n — 1), where Sj is calculated from 

63 = У У (0.1) (У (У), 


S. В. BROADBENT 49 


This is otherwise clear, since y,,/r has variance о, 

(ii) The variance of 2d, estimator of 28, is itself estimated by s}/n. 

(iii) s$ is independent of 2d, so the t-test. may be applied to hypotheses about 28. 

(iv) If we wish to set limits to a single new observation in the rth group, we must take 
into account the uncertainty of the supposed mean 2rd, which contributes a variance 
7*c*|n, and the variance r*g? of a single observation in this group. The limits are 


r(2d +t, 554 [(n + 1)/n]], 


where t, denotes the value of Student's t with (n— 1) d.f. at the appropriate probability 
level. 

(2-6) Example of estimation. Svedberg (1939) gives the molecular weights of fifty-six 
proteins determined by sedimentation velocity or by sedimentation equilibrium. For 
twenty proteins both methods have been used so that seventy-six measurements in all are 
given. Svedberg deduces a *law of simple multiples....If we choose 17,600 as the unit the 
majority of the proteins may be divided into eleven classes with molecular weights which 
are multiples of this unit by factors containing powers of 2 and 3. The rule is only approxi- 
mate, indicating that the underlying principle is obscured by some secondary factor.’ 
He notes seventy observations in eleven classes which he considers obey this law, and he 
gives the factor relevant to each class by which the unit is multiplied. 

We take these seventy observations as a random sample from a population of the type 
considered in § (2-5), and for each class take the factor given by Svedberg as r. It is clear 
from a study of the data that it is appropriate to suppose the s.D. of each class is linearly 
related to its mean. We can then apply the method of § (2-5) to estimate the quantum for 
which Svedberg gives the value 17,600. We obtain 2d = 17,920, sj = 3:83 (69 d.f.), i.e. 
95 % limits to 26 are 17,460 — 18,390. 

Six observations are so far from Svedberg’s supposed modes that he did not classify them. 
When the limits within which a new observation may be expected to lie are calculated it is 
found that two of the outliers are within 95 % limits, two within 98 % limits, and two within 
99 % limits. Therefore these extreme observations are not inconsistent with the remainder 
of the data. 

It may be remarked here that Johnston, Longuet-Higgins & Ogston (1945) consider that 
Svedberg's data do not support his hypothesis. Their tests will be discussed below in $ (4-6). 


3. ESTIMATION: VARIANCE 

(3-1) In $$ (2-3), (2-4) and (2-5) an estimator for о? and the form of its distribution were 
obtained when the positions of the modes were estimated. 

(3-2) Now suppose the positions of the modes are known, independently of the data, 
but that the observations have not been allotted to subdistributions and may overlap, 
i.e. lie nearer to another mean than their own. The weights p, are not known. The distribution 
is that of § (2-1). 

Let 2; (i = 1,2, ...,n) be the distance of y,, from the nearest mode, i.e. 


Zi = Yrs — (B+ 2r'), 


where 7’ is chosen to minimize |2;|. Clearly |; | <ô, and 2; is €, plus or minus an integer 
(or zero) multiple of 28. For those e,, satisfying | є„ | < д, the distribution of z; is the same ав 


4 Biom. 42 


50 Quantum hypotheses 
that of ¢,,, i.e. normal with mean zero and variance o? truncated at +6. For those e,, 
satisfying Û < e,, < 38, the distribution of z; is the same as that of (c,, — 28) truncated at + à, 
and so on. The effect of transforming from y to z is to cut the overall frequency distribution 
of y at ..., / — 8, +8, f +38, ... and to lump together the truncated portions. Since each 
subdistribution is the same, save for its mean and p,, the result is shown in Fig. 2, where the 
subdistribution (dotted line) is transformed into the distribution of z (fall line) by successively 
‘turning in’ the tails of the distribution. It will be noticed that the Pi do not appear in the 
distribution of z, nor js the allocation of y,, to its correct subdistribution relevant to 2. 


Distribution of z. 


Distribution of « 


Fig. 2. Lumping the normal distrbution. 


The lumped variance of the observations, s?, is defined by 
= 5 2n. 
{=1 


We now consider the distribution of 8/02 on the quantum hypothesis when the sub- 
distributions are normal. By the central limit theorem 82/02 is asymptotically normal, and 
we therefore derive its first two moments. The mean of 5*/0* is the same as the mean of 
2*/0*, and the variance of 52/5? is 1/n that of 22/02. 

Now each 2? is equal to €? for |6 | <ô, and for all other e these values are repeated with 
period 20. We may represent 22 by a Fourier cosine series: 


40*?5(—Y [rme 
ПЕРЕ: тл 208 
22 = {8 + т vos ;)- 
When calculating E[z2/3*] we may integrate the series term-by 


-term, since it is a uniformly 
convergent series of continuous terms for all z. We obtain 


о (__ ү” 2, 
Eph] = 5+ E ic exp ( ч on 


Expanding 2% as a Fourier series we obtain similarly 


Bie] = 1+ AIC (r22— 6) exp (- e) : 


S. R. BROADBENT 51 


The two infinite series thus introduced are functions of }7%0#/4? and have been tabulated 
by Newman (1934) in this form. They are integrals of one of the theta functions. We give 
in Table 1 some values of E{z*/é*] and Var [22/8] deduced from Newman's tables. 

Now we write E[z2/8*] = g(*/8*), the function g being tabulated below. It follows that 
31/8? is an unbiased and consistent estimator of g(o*/4*): 


* g(0/8*) x s*/8?. 
Since we require an estimator of g? explicitly, we rewrite this equation, 
t 
a? e; 03g 3 (82/0?) = d*h(s*/d*). 


This estimator is consistent and is recommended for general use rather than a maximum- 
likelihood estimator (which can be calculated if required) because of its simplicity. A table 
of h(s?/82) is needed when this estimator is used; it is given as Table 2, and has been obtained 
by interpolation in Table 1. 


Table 1. Mean and variance оў z*/6* on the quantum hypothesis. 
The variance of s*|8* is 1/n that оў 2*/3* 


fa Ys 
102/0 Е[238%) Var [22/02] 10/8 Efz*/8*] Var [22/02] 

0-0 0-0000 0-0000 10 0-1861 
1 0203 -0008 1:5 2431 0704 
2 -0405 -0033 2.0 -2785 0795 
3 -0608 -0074 25 -3001 -0839 
4 -0809 0129 30 3132 -0861 

0-5 0-1007 0-0194 3-5 0-3211 
6 1199 -0264 40 +3259 -0880 
1 1382 -0334 45 -3288 0883 
8 +1553 -0400 50 -3306 -0886 
9 © -3333 


4. THE LUMPED VARIANCE TEST 


(4-1) Suppose we are in the situation of § (3-2), so that the positions of the suspected 
modes have been given independently of the data. The observations are to be used to test 
whether there is real grouping about these modes. We neither know the weights p, to be 
attached to each mode nor can we allot any observation to its correct subdistribution. The 
alternative to the quantum hypothesis has not been specified, but we wish to interpret 
the intuitive feeling that the alternative hypothesis implies no preference for the suspected 
modes and is perhaps rectangular or unimodal. We could rather not assume too much about 
the form of the subdistributions (e.g. normality), and a test which is in some sense distribu- 
tion-free will therefore have advantages. 

(4:2) In these circumstances à test using the lumped variance is appropriate. Tt has 
already been noted that neither the p, nor the allocation of observations to the correct sub- 
distributions are relevant to the distribution of s?, Further, when the observations are 
clustered about the suspected modes s? is small in relation to 9*. The tendency of the obser- 


vations to be grouped at the modes is therefore measured by 8/8, and the form of the 
4-2 


52 Quantum hypotheses 


subdistributions will be used only in this respect. It is, of course, true that if the true modes 
are not those suspected but are near them, and c [à is small, then s*/8* will again be small, 

If y has a rectangular distribution, z has also a rectangular distribution between —@ 
and +6. If y has a smooth unimodal distribution whose spread is not small in comparison. 
with 8, a little thought will show that z has approximately the rectangular distribution, 
Therefore, we take the hypothesis that z is rectangularly distributed between – д and +3 
to be our null hypothesis. On this rectangular hypothesis 52/0? is the mean of n independent 
variates each of which can be shown to have mean з and variance 4%. These are also the 
limits of the mean and variance of 2*/6? on the quantum hypothesis as c|8— co. By the 
central limit theorem, 52/0? is approximately normally distributed with mean 4 and variance 
isn when n is large; for n > 20 the approximation may be expected to be good. 


Table 2. A(s2/0?). An estimate of о? is given by ò?h(s?/82) 


8/02 h(s*/d?) 
0-16 0-167 
TT *180 
:18 :194 
:19 :208 
:20 +223 
0-21 0-238 
+22 "255 
+23 +274 
+24 +295 
+25 "318 
0:26 0:344 
‘27 :374 
:28 *409 
:29 :452 
۰30 -506 
“31 "577 
+32 *690 


On the quantum hypothesis 52/5? has expectation less than 4. As c? increases the distribu- 
tion, although grouped, becomes indistinguishable from one of the class of alternatives. 


(4:4) Example: distribution of length in Labidocera euchaeta. Seymour Sewell (1912) 
measured the total lengths of large numbers of Copepoda with a view to testing whether 


5. К. BROADBENT 53 


they followed Brook’s law (Fowler, 1909) as stated for Somatopoda and Ostracoda. From 
his data we extract Table 4, of the distribution of length of females of the species Labidocera 
euchaeta. Sample A (Seymour Sewell's fig. 1) consists of 497 specimens from the Rangoon 


Table 3. Lumped variance test. If s*/8*, calculated from n observations, is less than the value 
tabulated at probability level P, the quantum hypothesis is accepted at that probability level 


35 2504 2161 1716 
40 -2558 2231 1878 
45 0-2602 0-2299 0-1960 
50 2640 2352 2030 
55 -2672 2398 -2091 
60 -2700 2438 2144 
65 +2725 2473 2191 
70 0-2747 0-2504 0.2232 
75 3761 3532 9269 
80 -2785 -2558 2303 
85 -2801 2581 2334 
90 -2816 2602 2362 
95 -2830 2622 2388 
100 0-2843 0-2640 0-2412 
150 -2933 3767 +2581 
200 -2987 2843 2682 
250 -3023 2895 2151 
300 -3050 2933 2801 
350 0-3071 0-2963 0-2841 
400 -3088 -2987 -2873 
450 -3102 3006 2899 
500 3114 3023 2921 
550 3124 3038 2940 
600 0-3133 0-3050 0-2957 
650 3141 -3061 2972 
100 3148 3071 -2985 
750 3154 3080 2997 
800 3160 3088 3008 
850 0-3165 0-3095 0-3017 
900 3170 3102 3026 
950 3174 3108 3034 
1000 3178 3114 3042 


River Estuary, and sample В (his fig. 2) of 157 collected after sample A from Chittagong. 
We shall use these data to answer the question ‘Do the specimens from Chittagong show 
any tendency to occur relatively more frequently at the modes from the first sample?’ 


54 Quantum hypotheses 


It is expected from previous work that when length is taken on a logarithmic scale the 
data will be grouped about approximately equally spaced modes (any change in the growth 
factor is here being neglected). Because the groups overlap and no other information is 
available, the positions of the population modes of sample A cannot be estimated by the 
methods of $ (2). When the experimental modes at 15, 22, 30, 42, 54 and 65 units of length 
(the modes at 20, 36 and 49 units are seen from a figure to be spurious) are plotted on a 


Table 4. Distribution of lengths of females, Labidocera euchaeta 
(unit of length 0-04 mm.). Semour Sewell’s data 


Sample B 

2 0 43 15 6 
3 0 44 5 3 
3 0 45 6 3 
3 0 46 9 2 
T 0 47 16 2 
0 0 48 18 2 
1 0 49 28 | 2 
3 0 50 21 8 
2 5 51 25 11 
4 0 52 25 12 
1 0 53 28 10 
2 0 54 31 9 
4 0 55 19 9 
5 1 56 9 % 
2 57 4 1 
5 58 2 2 
4 59 1 0 
2 60 0 1 
0 61 1 0 
0 62 0 1 
0 63 1 0 
1 64 T 2 
1 65 12 2 
3 66 9 0 
4 67 7 1 
7 68 4 0 
6 69 3 0 
5 70 1 1 
6 71 1 1 

7 
157 


logarithmic scale against г = 0,1,...,5 they lie approximately on a straight line. Fitting 
a line by eye we conclude that sample A suggests modes at 1-218 (0-126) 1-848 on a log- 
arithmic scale. 

Wenext require to test whether sample B supports this grouping, and do so by the lumped 
variance test. Taking logarithms, we calculate the square of the distance of each observation 
of sample B from the nearest mode deduced from sample A. The sum of these squared 
distances, divided by пф? = 157 (0-126/2)2, is 0-177. Comparison of this value with Table 3 
shows that departure from the value expected on the rectangular hypothesis is highly 


S. R. BROADBENT 55 


significant. We conclude that sample B supports strongly the grouping we have deduced 
from sample A, i.e. that grouping is real and occurs near the modes given above. 

If an estimate of the scatter about these modes is required, and the assumptions of $ (3:2) 
are justified, the method of that section gives an estimate of the variance about each mode 
on a logarithmic scale. Since s*/d* is 0-177, and 2° is (0-063)*, the variance is estimated as 
02105202) = (0-063)? (0-189) = 0-000750. 

(4:5) Example: choosing a random point. The following simple experiment is described 
because it exemplifies the use of lumped variance. It was designed to test whether the points 
on a line chosen ‘at random’ by a number of people were in fact grouped about certain 
regular points on the line. 

Twenty subjects were chosen from workers at B.C.U.R.A. and each was presented with 
a sheet of paper on which were drawn eight parallel lines of length 12cm. Each subject was 


05 


01 


Qu S3 эё ТР, УШК, КУ EMO ELISEO RAS RAG 8599 $20 


Subject number 


Fig. 3. Choosing a random point. Control chart of s*/5*. 


then asked to mark points on the lines at random, i.e. without systematic placing. He was 
to make one mark on each of the first two lines, two, three and four on each of the successive 
pairs of lines, making twenty marks in all. Alternate subjects marked their lines in the 
reverse order. 

The points about which systematic grouping was expected were those which divide the 
line into (n + 1) equal segments when n marks were made. Thus when one mark was made it 
was expected the end-points and centre would be preferred, when two marks were made 
the end-points and thirds, and so on. In this situation the lumped variance test is applicable. 
When n points are marked апа is the length of the line, à is //2(n + 1), and z is the distance 
of a mark from the nearest ‘preferred’ point as defined above. 

The values for 52/8? found in the experiment are given in Table 5, and are shown also for 
subjects in the form of a control chart. In Table 5, 1a denotes the first line on which one 
mark was made, 16 the second line on which one was made, and so on. 

The conclusion drawn is that no evidence was found that the subjects did not choose 
points on the lines at random; the data are consistent with the hypothesis that all points 


56 Quantum hypotheses 


(4:6) Finally, we consider the statistical tests applied by Johnston et al. (1945) to Sved- 
berg's (1939) data. We have already noted, in $ (1-2), the difficulty in testing a hypothesis 
which is suggested by the data used in the test. Suppose, however, that а lumped variance 


Table 5. Choosing a random point 


Subject : Subject 2/55 . 2/32 
number eu number 8/8 pios p 
m LLL EP E NI 
1 0-371 11 0-199 1а 0-369 
2 “364 12 479 1b :294 
3 “379 13 ۰373 
4 ۰366 14 414 2@ 0-345 
5 417 15 413 2b "316 
6 0-328 16 0-170 3a 0-360 
7 +336 17 -286 3b +309 
8 241 18 :325 
9 :321 19 ۰380 4a 0-348 
10 311 20 ۰359 4b 364 
Average 0-342 Weighted average 0-342 


test is applied to such data, using the modes suggested by or calculated from the data, and 
that the value of 8°/0° found is not significantly less than the value $ expected on the 


grouping of the data about these modes. In such circumstances it would be wrong to accept 
the quantum hypothesis solely because s?/8? is less than a conventional significance 
point, whereas it is proper to reject the quantum hypothesis if s?/32 does not reach sig- 


the notation defined above, 
F(n) = 1-2 zl “ |/(nd)}. 


5. R. BROoADBENT 57 


Itis distributed approximately normally about zero with variance 1/(3n) on the rectangular 
hypothesis, and a positive value indicates grouping about the supposed modes. 

They extend this definition of F(n) to those cases in which the modes are not equally 
spaced, taking as the д corresponding to each z; half the distance between the modes on 


either side of z;. A similar extension is, of course, possible for the statistic X 23/(nd*) of $ (3-2). 
1 


The rectangular hypothesis they specify in this case to be a rectangular distribution of y 
between each mode; this is unnecessarily restricted, for we can consider the class of all 
distributions of y which correspond to a rectangular distribution of z. 

Johnston et al. apply their test to the fifty-six values given by Svedberg, averaging the 
results of two determinations when these are available. They test for grouping about the 
values 2^3" x 17,600 for all integral » and m, and obtain for F(n) the value 0-137 with 
8.D. 0-077; this corresponds to a two-tailed probability of 0-076. Since this does not reach 
significance they conclude Svedberg's hypothesis receives no support from the evidence. 
A subdivision of the data is intended to show that only observations near the first two 
Svedberg numbers support his hypothesis. Two further tests, applied to part of the data 
only, and whose power may be suspected to be low, suggest a similar conclusion. 

The probability given by Johnston et al. should properly be one-tailed, since the alter- 
native hypothesis suggests a positive F (n). Their value therefore exceeds the 5% significance 
level. Moreover, Svedberg proposed eleven groups, the factor multiplying 17,600 being 
of the form 273" for certain values of n and m only. When the ‘correlation function’ test is 
applied to the values given by Svedberg, F(n) exceeds the 0-1 % significance level. The 
lumped variance test gives the same result, indicating, contrary to the findings of Johnston 
et al., that the data do support Svedberg’s hypothesis. For the reasons already given (§ 1-2) 
no positive conclusions may be drawn from this. 


The author is indebted to Prof. G. A. Barnard, Dr J. P. Harding, J. M. Hammersley and 
D. G. Kendall for their advice, and to the Director-General of The British Coal Utilisation 
Research Association for his permission to publish this paper. 


REFERENCES 


Brooks, W. K. (1886). Report on the Scientific Results of the Exploring Voyage of H.M.S. Challenger, 
1873-76, 16, (Zoological Section), Stomatopoda, 1. 

Fowter, С. Н. (1909). Trans. Linn. Soc. (Zool.), 2nd ser. pt. 9. 

Grant, P. J. (1952). Proc. Phys. Soc., Lond., A, 65, 150. 

HAMMERSLEY, J. М. (1950). J. R. Statist. Soc. B, 12, 192. 

HAMMERSLEY, J. M. & Morton, К. W. (1954). J. R. Statist. Soc. B, 16, 23. 

JOHNSTON, J. P., Lonauet-Hicers, Н. C. & Oasron, A. G. (1945). Trans. Faraday Soc. 41, 588. 

Newman, A. B. (1934). Trans. Amer. Inst. Chem. Engrs, 30, 598. 

PEARSON, E. S. & CHANDRASEKHAR, C. (1936). Biometrika, 28, 308. 

PEARSON, К. (1894). Phil. Trans. A, 185, 71. 

PRZIBRAM, Н. & MecuSar, Е. (1912). Roux Arch. EntwMech. Organ. 34, 680. 

SEYLER, C. A. (1943). J. Inst. Fuel, 16, 134. 

SEYMOUR SEWELL, К. B. (1912). Rec. Indian Mus. 7, 313. 

SVEDBERG, T. (1939). Proc. Roy. Soc. A, 170, 40 and B, 127, 1. 

Tuom, A. (1955). J. R. Statist. Soc. B. (in the Press). 

WIGGLESWORTH, V. D. (1942). Principles of Insect Physiology, 2nd ed. London: Methuen. 


[ 58 ] 


THE TRUNCATED NEGATIVE BINOMIAL DISTRIBUTION 


By M. R. SAMPFORD 
Agricultural Research Council Unit of Statistics, U niversity of Aberdeen 


1. INTRODUCTION 


The negative binomial distribution has been discussed by, inter alia, Greenwood & Yule 
(1920), Fisher (1941), Haldane (1941), Anscombe (1950) and Bliss & Fisher (1953), and is 
extensively used for the description of data too heterogeneous to be fitted by a Poisson 
distribution. Observed samples, however, may be truncated, in the sense that the number 
of individuals falling into the zero class cannot be determined. For example, if chromosome 
breaks in irradiated tissue can occur only in those cells which are at a particular stage of 
the mitotic cycle at the time of irradiation, a cell can be demonstrated to have been at that 
stage only if breaks actually occur. Thus in the distribution of breaks per cell, cells not 
susceptible to breakage are indistinguishable from susceptible cells in which no breaks occur. 
Methods for estimation of the parameters of the truncated distribution are considered 
in this paper. The corresponding problem of estimation of the truncated Poisson distribution 
has been discussed by David & Johnson (1952), who also discuss the present problem. 


2. THE MOMENTS OF THE TRUNCATED DISTRIBUTION 
The negative binomial distribution has the form (in Fisher’s notation) 


(k+r—1)! р 


POM Ga Dir! (ерут C7 O15 nbn 0), а 
so that P(0) = 1/(1+p)*. 


To obtain the corresponding probabilities for the truncated distribution the form (1) 
must be divided by (1— P(0)); writing 
Ties 1/(1 +p), 1=1-0, 
it follows that 
wk (k+r—1)! 


Pdr) = oe ri STE (r= 1,2,...), (2) 


The factorial moments of this distribution are 


Ил = Tyla — oy’ (3) 
whence M = ad ary 
+R? 
fa = P-o’ 
‚ _ kn +k(3k + 1) 9? + k?y? 
з= w*(1— w") , 


‚ _ F(6 = 6m +w?) + Ey (011 — 4) + 6159 + y 
AS (1 o : 


М. R. SAMPFORD 59 


3. ESTIMATION BY MOMENTS 


David & Johnson (1952) suggest that the use of estimates of less than maximum efficiency 
is justifiable only if they are directly obtainable as explicit solutions of easily constructed 
equations. In discussing the truncated negative binomial, therefore, they do not consider 
estimates based on the first two sample moments, which do not provide explicit solutions, 
but confine their attention to a method using certain ratios of the first three moments, 
i.e. sample estimates of 44/4 and 43/4. The estimates (only that of p is discussed in detail) 
are obtained easily enough, but, in consequence of the introduction of the third sample 
moment, are extremely inefficient. (For example, the efficiency for values of the para- 
meters equivalent to k = 1, w = 0-5is as low as 1-7 %.) David & Johnson therefore abandon 
completely the use of moments and recommend the maximum-likelihood method for use 
in all cases. 

Whether, in fact, any particular inefficient procedure is acceptable can only depend on 
the loss of information resulting from its use, the time saved by it, and the relative costs 
of the time or labour spent on observation and on analysis. Thus if an experiment involves 
observations on hundreds of experimental animals, made over a period of several years, 
ten or even a hundred hours of calculation may be dearly saved at the cost of 10 % of the 
information so laboriously accumulated. If, on the other hand, the observations are made 
easily and at no great cost, the use of a convenient but statistically ‘inefficient’ method of 
analysis, coupled with an appropriate increase in sample size, may be far more ‘efficient’ 
than a tedious maximum-likelihood calculation, in the sense of giving the same amount of 
information at a lower cost. In this section I give a trial-and-error method for solving the 
moment equations using the first two moments. This method, though not explicit, does not 
take many minutes to carry through. It certainly entails far less labour than the maxi- 
mum-likelihood calculations, and the estimates obtained have a percentage efficiency of 
80 or upwards for all but the most unfavourable combinations of parameters: for k= 1, 
w = 0:5 the efficiency is 77-5 % ($6, Table 2), as compared with the 1-7 о of the ‘three- 
moment’ estimate. This method can thus be recommended as a reasonable alternative to 
the maximum-likelihood calculations, in circumstances where a method of less than 100% 


efficiency seems likely to' prove acceptable. 
Equating the population mean and variance to the corresponding sample values gives 


equations for k and w: 


902907" P 
1+ 
a = m+ 8. 


These equations can be solved by trial and error, for which purpose they are most con- 
veniently expressed in the form 
2 
k= (m+ 157 
(5) 


2 
Е = тоб 5р1 =0; 
т 


The value of £ сап be evaluated for selected values ofw, and a solution, if one exists, reached 
by successive linear interpolations. 


60 Truncated negative binomial distribution 
To investigate whether a solution exists, and to simplify still further the computations 
required, we consider the function ] 


(6) 


This function is tabulated, for values of x between 0 and 1, in Table 4, and can be shown 


to possess the properties 
E Vires $(z)-0,1 as 0-0,1, 


ф'(0)>0, ó'(«0, O<a<l. 


The second of equations (5) then takes the form 
s? 8? 
бе) = mexp|—(m+=—1) dim) +a = 1 (7) 


If all the observed values equal 1, m = 1 and s? = 0, and ¢=1. Thus in this situation the 
moment estimation procedure, naturally enough, fails. In all further discussion, therefore, 
m will be assumed > 1. 

It is easily verified that 


Wy = fne) («1) 


satisfies equation (7). However, the corresponding value of Ё, obtained from the first of 
equations (5), is 0. w,, therefore, and any lower values of w (corresponding to negative 
values of £) are inadmissible: we require a solution of (7) in the range 


00<0<1. 
We have 6(0) = m1, (a) =1, 6">0 (0cm«l) 


(The result on ¢” follows from that on $" quoted above.) There exists, therefore, at most one 
solution of (7) other than y in the range 0 < w< 1. The condition that there shall be another 
solution less than 1 is 
61)» 1, 
the condition that it shall be greater than m, is 
6 (my) « 0. 


These inequalities reduce to limitations on the relative magnitudes of and 8?, most 
conveniently expressed in the form 


icem) e mtf (s. (8) 


М. К. Samprorp 61 
Equation (7) сап be solved either iteratively, writing 


ты = je! —mexp| ~(m+=—1) е0), (9) 


or by trial and error. Inasmuch as the solution must lie on that part of the ¢ curve for which 
¢’ > 0, between the (unknown) minimum of ¢ and the point w = 1, it is most convenient to 
take as the first trial value of w either 1, or, if it is less than 1, m = m/s? (clearly С> 1 for 
this value), and to work down to the solution. This approach will ensure the convergence of 
the iterative procedure (9), and should minimize the amount of labour required for the 
trial-and-error procedure. In the latter case time will probably be saved by calculating, 
for the first trial value of т, the slope 
pu et st ф(т) – m 8? 
С) = тт 1) | Jexp{—(m+=—1) ә). (10) 


0(1— 9) 


Except for the function in square brackets (= ф'(о)) all the terms used in calculating this 
expression will already have been evaluated in the calculation of &. This slope can be used 
in choosing the second trial value, after which one or two linear interpolations or extra- 
polations should be sufficient. If w = 1 is taken as the initial approximate, ¢'(w) should be 
replaced by its limiting value of 3. 


4, EXAMPLE OF ESTIMATION BY MOMENTS 


In an investigation into chromosome breakage, the following sample distribution of breaks 
per cell was obtained: 
r = 1(11), 2(6), 3(4), 4(5), 6(1), 8(2), 9, 11, 13. 
n= 32, Уу = 110, Xr? = 080. 
т = 3:4375, 82 = 9-9315, s?[m = 2-8892. 
Equation (7) is thus 
3۰4375 e-532676 + 2.88920 = 1. 


Taking as the first trial value 
w = 1/2-8892 = 0-3461, 


we obtain from Table 4 


ф = 0-5616, 
whence (ф—ю)/ш(1—) = 0:9522. 
Then 5-326079 = 2:9915, 


and from tables of the negative exponential function, or of natural logarithms, 
e-29915 = (.05021, | 3:43756-?9915 = 0-1726, 

whence ¢ = 0-1726 + 1-0000 = 1:1726 

and £' = 28892 — (0:1726 x 5:3267 x 0-9522) = 2-0138, 


suggesting, as the next trial value, 


w = 0:3461 — 0-1726/2-0138 = 0-2604. 


62 Truncated negative binomial distribution 

The remainder of the trial-and-error solution, shown in Table 1 (a), leads to the value 

w = 0-2346, 

whence k = 5:3267 x 0-2346/0-7654 — 1 = 0-633. 

Alternatively, by the iterative method, starting from the same initial value, the second 
орох m, = (1—0-1726)/2-8892 = 0-2864. 
The remainder of the calculations, leading after sixteen cycles to the same solution, are 
shown in Table 1 (5). 


Table 1. Solution of the moment estimation equation for the example of $4 


(a) By trial and error 


34375 e-320 


0-1726 1:1726 2-0137 
0-2604 0:4737 0-2757 1-0280 
0:2438 0:4550 0-3046 1-0090 
0۰2359 0-4459 0-3197 


0:3222 


(b) By iteration 


т ф 3-4375е-32074 


0-3461 0-5616 0:1726 
0:2864 0-5018 0-2374 
0-2639 0:4776 0-2700 
0:2527 0-4651 0:2886 
0:2462 0-4578 0:3000 
0:2423 0-4533 0:3073 
0-2397 0:4503 0:3123 
0-2380 0-4484 0:3154 
0۰2370 0:4472 0:3175 
0۰2362 0:4463 0۰3190 
0:2357 0:4457 0:3200 
0:2353 0:4452 0۰3209 
0۰2350 0:4448 0:3216 
0:2348 0:4446 0۰3219 
0:2347 0:4445 0:3221 
0:2346 0:4444 0:3222 
0:2346 


E al AN, 


(Intermediate stages of the calculations are shown in Table 1 for the sake of the example. 
In practice either calculation can be carried through on a desk computer, recording only the 
successive values of w and, for the trial-and-error process, б) 

Four decimal places have been retained in the estimation of т as a demonstration of 


М. R. Samprorp 63 


5. THE VARIANCES OF THE MOMENT ESTIMATES 


The asymptotic variances and covariances of the moment estimates 
RM es of w and & are given 


UA F дид, 
PV (a) = (23) V(mj)—2 2 cov (mma) + (HA) V (mj), 


; диди 4. [д Әх dua 
#ооу (w, k) = — DP (т) + (Be 2) eov (mi im) ig Ting. 


ova) = (28 он)” Tin) — 2252/5 cov (mi, ms) (29) omg, 


where Ai -4 q  - $(05)), 


ks ' , 
zi = арония, 


д + + + 
x = fi - P e-o, 


J = ёа диз _ да диз 


and т (т) = (05—19), ncov (т, тз) = (43—413), MV (mg) = (4 — 2). 
Inserting into these formulae the moment estimates obtained in the example of the previous 
section, 
w = 0-2346, k = 0-633, 

we have (log, w = — 1-44988, w* = 0-3994): 

и, = 3-438578, д = 21-758578, д = 215-773952, p4 = 2941-290671, 
whence V(mj) = 0:31046123, cov (mi, mj) = 44048558, (т) = 77:120467. 
Also др! /ӘЕ = 2-1164, әш [дт = —12-9798, 24/0) = 24-6106, 4/2 = — 184-1591 

J = —70-31365336, 

whence (тю) = 0-015091, cov (w, К) = 0-08125, V(k) = 0-4983. 


6. THE EFFICIENCY OF THE MOMENT PROCEDURE 


It has been shown by Fisher (1941) that the method of moments may be seriously inefficient 
for the estimation, from complete data, of the parameters of the negative binomial dis- 
tribution. Investigation of the corresponding efficiency for estimation from truncated data 


therefore seems desirable. 
The determinant of the variance-covariance matrix of the moment estimates reduces to 


2(k-- 1)o*(1— v5. {1—o*[1 + ky + }k(k + 1) g*]) 
n?» nile —w[1— k?y — k(k + 1)log 9])2 
The determinant of the information matrix for maximum efficiency is 
n*(1— (57 a (r-1)k!  ko*(n+loga 
w*(1—w*? |, т (k+r—1)! ТЕТУ LUN MIS 
NE St 


64 | Truncated negative binomial distribution 
whence the reciprocal of the efficiency of the moment method, given by the product of 
these two expressions, is 


1 От ky Е 1) 1 о + ky} 
Е {1-o*[1 – £5 — (Е 1) logo}? 
ә 9 *(r-1! (6-1). 2h(k+ 1) ек log w]? 
ee sea teres bee i 


Table 2 shows the percentage efficiency of the moment method for selected values of Ё, 
and of the mean of the complete distribution. From this it is clear that even for quite 
small means there may be a serious loss of efficiency for low values of Е. 

For the example of $4, k was estimated as 0-633, and the mean of the complete distribu- 
tion = k/w = 2-065, from which estimation by moments would appear to be about 70 p^ 
efficient in this case. 


Table 2. Percentage efficiency of the moment method of estimation, for selected 
values of k and of the mean of the complete distribution 


1 2 3 4 
90-6 95-7 97-5 98-8 
847 92-4 95-4 96-9 
77:5 87-5 92-0 94-4 
67-3 79-2 85-5 89-2 


7. ESTIMATION BY MAXIMUM LIKELIHOOD 


In view of the results of the previous section, it seems desirable to consider the maximum- 
likelihood estimation procedure in some detail. 


The log likelihood is 
m. wk (k+r—1)! й 


= “blog w= n log (1 — at) + X rn, log (1—)— Y. n, log rl + 5 T. 5 log (k -- j — 1), 
1 r=1 j=l 
giving the maximum-likelihood equations 
nk nm 
al- 1) = ° ү 


nlogw 2 г 


1 
апа Б eee 
1l—w* +z „> (k+j—1) 0; (3) 


which, following Haldane (1941), is conveniently rewritten as 


nlogw Е y R 
ES k -1)-1 = 
1—w* рУ 1329 à NT M en 


where Ё is the highest observed value of r. 


М. R. Samprorp 65 


These equations are easily soluble by the usual maximum-likelihood iterative procedure. 
The components of the information matrix are the expected values of the quantities 


_@loghL  nk[l—(k--1)m*] nm 


w ^ (lw ^*(ü-w 


дш ^  wü-w) ' (14) 
ОЗОТ ы aie n(log т)? w* 
ET 264-0 Ж-ү wig Р 


The iteration, however, is most conveniently carried out using the quantities (14) themselves, 
rather than their expected values. An example of the calculations for this method is given 
at the end of the next section. 


8. AN ALTERNATIVE METHOD FOR THE SOLUTION OF THE 
MAXIMUM-LIKELIHOOD EQUATIONS 


From equations (11) and (13) it follows that 


mn Glog. f Е. z 
=— =— У (k+j-1)7 s 
I=- e nuu Ea (15) 
where û and Ê are maximum -likelihood estimates, and m is the sample mean. Equation (11) 
can be rewritten ®(1- @8)т 


Yô, È) = 1. (16) 


Й(1-®) 

For a given value of k, equation (15) can be used to evaluate the corresponding ¢ and hence, 
from Table 4, w; the value of ¥ can then be calculated. This form of the equations was 
presented by David & Johnson (1952); they suggested an iterative method, equivalent in 
the present notation to Еца = [kiy (0 Ed], 


and provided a table of values of р = (1—w)/w for values of ф (in their notation, y) 
0-40 (0-05) 1-00. This process, however, appears to converge rather slowly, and a solution 
can, in many cases, be reached more expeditiously by a trial-and-error process, starting 
with the moment, or some other inefficient, estimate of k. For this value of k the value of yr 
can be evaluated, together with the slope of the curve (ү = V/(k)) defined by (15) and (16): 


ve) = 5) matin [SES @+у-# ®]-Кп-ф=й. — a» 
| ф-® Е ттуу (= pial ТЕ 
These values provide a second approximation to the root, which can then be found by a 
process of successive linear interpolation and extrapolation. Final adjustments may be 
made, if required, by the usual maximum-likelihood iterative procedure: this will usually 
be unnecessary, but the quantities (14) can be calculated, to provide an estimate of the 
asymptotic variance-covariance matrix of the estimates. 

As with the moment method, extreme samples can occur for which the equations (15) 
and (16) have no solutions with k>0. Unfortunately, the function y(k) is of considerably 
more complicated a form than the function ¢(w) occurring in the moment method, and the 
existence problem is correspondingly more difficult of solution. It can be shown that 


(0) =1, 
12 1 R 
¥'(0) = = У = X nt bloga, 


ту=3()— 1) r= 
Віот. 42 


66 — Truncated negative binomial distribution 


where d(x) = 1/m, 
and that lim y(k) = m(1—e~*)/6, 
ko 
25 . z 
where 0 = nm jas U= E n,. 


Clearly the conditions 
¥'(0)>0, lim v(k)«1 
ko 


are sufficient to ensure the existence of at least one solution, and I would conjecture that 
they are necessary, and also sufficient to ensure uniqueness, but have been unable to prove 
these results. 

The question remains: What action is to be taken when the maximum-likelihood (or 
moment) method fails to give an acceptable solution? 1% seems reasonable to hope that one 
or other of the two limiting forms of the negative binomial, the Poisson distribution and _ 
Fisher’s ‘logarithmic series’ distribution, will provide an adequate fit for all but some 
pathologically extreme (and highly improbable) samples which are unlikely to be fitted 
satisfactorily by any meaningful distribution. 


9. AN EXAMPLE OF THE MAXIMUM-LIKELIHOOD CALCULATIONS 


The data are those already used in $4; details of the calculations are shown in Table 3. The 
quantity Zn,, used in the calculation of ф, is tabulated in the third column of Table 3 (b), 
and the remainder of the calculations are shown in some detail in Table 3 (a). In fact, this 
table shows far more detail than need be recorded in practice; if the calculations are made on 
à desk machine only the first three columns of Table 3 (5) and the successive values of k, w 
and у need be written down. (The weighted sums of quotients in the second column of 
Table 3 (a) can be accumulated on the machine, and they, together with the values of ф, 
Іор т and т“, are used as soon as they are calculated, and need not be written down.) 
However, ¢ and w* should be recorded for the first trial value, for use in evaluating y’, | 
and if final adjustments are to be made by the iterative process, all the quantities tabulated 
in Table 3 (а) should be recorded for the last trial value of k. The fourth and fifth columns of 


R 
Table 3 (Б) are recorded for the purpose of calculating X(k--j— 1)-? X n,; the fourth, 
‚= 


(0:63 +7 — 1), for use in calculating y’; the fifth, (0-493 + J — 1)*, for use in calculating the 
variance-covariance matrix, The only remaining values required in this section are 


nm = 110, т = 3-4375. 

As an initial trial value the moment estimate of k, 0-633, was rounded to two decimal 
Places, giving (0-63) = 0:9922, 
As a guide to the location of the second trial value, y was evaluated for = 0-63. Accumu- 
lating on the machine, E 
У (0-63+j-1) Y, n, = 92-2942, 
j=l -j 
whence : 


y= (0-9922 —0-3179)(0-7008—0-5286) 0-9922 x 0-3905 
0-2094 a 0-63 


= — 0-0605. 


М. R. SAMPFORD 9 


The second trial value was therefore taken as 
0-63 — (1 — 0-9922)/0-0605 = 0-50. 
The fourth value, Ё = 0-493, gives a value of y equal to 1 to four decimal places. There is 
no hope of improving further on this estimate by the trial-and-error method, without 
taking more decimal places, and using a more complicated interpolation formula, in the 
Table 3. Maximum-likelihood calculations for the examuple of §8 
(a) Trial-and-error calculations 


R 
в  /X(k+j—-1)> Xon 
r=j 


0-63 11-0905 0-4415 0-2321 
0-50 91-9239 0-4178 0-2124 
0-491 93-2185 0-4161 0-2110 

92-9270 0-4165 0-2113 


(6) Subsidiary tabulations 


(j — 0-37)? (j — 0:507)? 
raj 

1 11 32 0-3969 0-243049 
2 6 21 2-6569 2-229049 
3 4 15 6-9169 6-215049 
4 5 11 13-1769 12-201049 
5 0 6 21-4369 20-187049 
6 1 6 31-6969 30-173049 
1 0 5 43-9509 42.159049 
8 2 5 58-2109 56۰145049 
9 1 3 74-4769 72.131049 
0 2 92-7369 90-117049 

1 2 112-9969 110-103049 

0 1 135-2569 132-089049 

1 1 159-5169 156-075049 


table of ¢. There is, in fact, no great need of any further improvement, but, if desired, final 
increments to k and v can be calculated using the variance-covariance matrix, which requires 
calculation in any case. 
The fifth column of Table 3 (b) shows values of 
(0:493 --j — 1), 
and from these values (aceumulating quotients on the machine) 


R 
3 (0-493 1j — 1)-2 P n, = 145-196464173. 


68 Truncated negative binomial distribution 
Then, from this value and the entries in the last line of Table 3 (а), the estimate of t 
variance-covariance matrix, calculated from formulae (14), is 


y 554-4186 — 94:69298 1-1  [0-00986279 | 
(71 = Perce 19-792976| [0-0471853 0-276265 | 


Inserting the values Ё = 0:493, т = 0-2113 into the left-hand sides of the maximum- 
likelihood equations (11) and (12), we obtain 0-006 and — 0-0003 respectively. Thus the final 


increments are rn 2 m| 0-006 | » paesi 


ok —0-0003| ^ | 0-00020 
Table 4. The function p(x) = Bd 
Tm 3 


0-02 0-03 0-04 0-05 0-06 0-07 0-08 0-09 


0:0798 | 0-1085 | 0-1341 | 0-1577 | 0-1796 | 0-2002 | 0-2196 | 0-2381 
0-2891 | 0-3049 | 0-3201 | 0-3348 | 0-3491 | 0-3629 | 0-3764 | 0.3896 
0-4271 | 0-4390 | 0-4507 | 0-4621 | 0-4733 | 0-4843 | 0-4950 | 0-5056 
0:5362 | 0-5461 | 0-5558 | 0-5653 | 0-5747 | 0-5839 | 0-5930 | 0-6020 
0:6282 | 0-6367 | 0-6451 | 0-6533 | 0-6615 | 0-6695 | 0-0775 | 0-6854 


0-7084 | 0-7159 | 0:7233 | 0-7307 | 0-7380 | 0-7451 | 0-7522 | 0:7593 
0:7800 | 0-7867 | 0-7934 | 0-8000 | 0-8066 | 0-8131 | 0-8195 | 0-8259 
0:8447 | 0-8509 | 0-8570 | 0-8630 | 0-8690 | 0-8750 | 0-8809 | 0-8868 
0-9041 | 0-9097 | 0-9154 | 0-9209 | 0-9265 | 0-9320 | 0-9374 | 0-9429 
tics ш 0:9694 | 0-9746 | 0-9797 | 0-9848 | 0-9899 | 0-9950 


The maximum-likelihood estimates are therefore 
$ = 0:2113 40-0998, Ё = 0-493 + 0:526. 


The variance-covariance matrix for the moment estimates, recalculated in terms of the 
maximum-likelihood estimates to provide a more valid comparison, is 


0۰01364125  0-0693156 
0:0693156 — 0-405868 |' 


The ratio of the determinants of the two matrices, converted to a percentage efficiency, i$ 
100 x 0-000498/0-000732 = 68-1 95: 


The efficiency is much the same for the two parameters, the actual values being 72:3 % 
for w, and 68:1 % for k. 

The maximum-likelihood estimate of k does not differ significantly from 0, which suggests 
that the data might be adequately fitted by Fisher’s logarithmic series distribution. This, 
in fact, proves to be the case. The last four columns of Table 3 (b) show the expected numbers 
of cells with j breaks on the basis of four fitted distributions, a truncated Poisson (fitted by 
maximum likelihood), the two negative binomials fitted above, and the logarithmic series 
distribution (fitted by maximum likelihood). The fit of the Poisson distribution is obviously 
very poor; equally obviously the other three are all very good. The negative binomials | 


М. R. SAMPFORD 69 


appear, superficially, to give a slightly better representation of the data, but this impression 
is largely due to the very good agreement in the first group. In fact the justification for 
using the negative binomial rather than the logarithmic series distribution comes, not 
from the data presented here, but from the whole series of experiments, only one of which 
is used here. In this series some distributions could be fitted adequately by the truncated 
Poisson, some by the logarithmic series distribution, and some not very satisfactorily by 
either, but the negative binomial form, with appropriate k, gave а good fit in all cases. 


Iam indebted to Dr C. E. Ford, of the Atomic Energy Research Establishment, Harwell, 
for permission to use his data in my examples, and to Miss A. D. Outhwaite, for drawing 
my attention to an omission from $3. 


REFERENCES 


ANSCOMBE, Е. J. (1950). Sampling theory of the negative binomial and logarithmic series distributions. 
Biometrika, 37, 358. 

Buss, C. I. & FISHER, В. A. (1953). Fitting the negative binomial distribution to biological data, with 
a note on the efficient fitting of the negative binomial Biometrics, 9, 176. 

Davi», Е. N. & Јонхвох, N. L. (1952). The truncated Poisson. Biometrics, 8, 275. 

FISHER, В. A. (1941). The negative binomial distribution. Ann. Eugen., Lond., 11, 182. 

GREENWOOD, M. & Yure, G. U. (1920). An enquiry into the nature of frequency distributions repre- 
sentative of multiple happenings with particular reference to the occurrence of multiple attacks 
of disease or of repeated accidents. J. R. Statist. Soc. 83, 255. 

HALDANE, J. В. S. (1941). The fitting of binomial distributions. Ann. Eugen., Lond., 11, 179. 


[ 70 ] 


THE RANDOMIZATION ANALYSIS OF A GENERALIZED 
RANDOMIZED BLOCK DESIGN} 


By M. B. WILK 
Iowa State College 


1. INTRODUCTION 
A 1 (1-1) The experimental situation and design 

Suppose that ¢ treatments are given whose properties (yields, responses, effects, etc.) we 
wish to compare when they interact with a given set of rs experimental units, the latter 
being classified into r blocks, each containing s = pt units. Suppose, further, that an 
experiment is carried out in which the treatments are applied at random to the experi- 
mental units, with the restriction that each treatment appears with p units in each of the 
r blocks. We refer to this design as the generalized randomized block design and note that it 
includes as special cases the completely randomized design (r = 1, p > 1) and the randomized 
block design (r» 1, р = 1). 4 

The object of this paper is to study the basis for statistical inference which is provided - 
by the randomization procedure. 


(1:2) Some previous work 
The introduction of the device of randomization in the statistical design of experiments 
is due to R. A. Fisher (1926, 1935). A brief review of some important contributions to the 
problem of inference from randomized experiments is given by Wilk (1953). 


domized block design, and by Kempthorne (1952 b) for the completely randomized design, 
and the expectations under randomization of the analysis of variance mean squares for 
randomized blocks with non-additivity given by Kempthorne (1952 a), derive as special 
cases from the results (Wilk, 1953) on which the present paper is based. 


(1:3) Experimental error and randomization 

In many experimental situations it Seems reasonable to distinguish two sources of 
experimental error, namely, the failure of different experimental units treated alike to 
respond identically, and the inability to reproduce an applied treatment exactly. The first 
of these, which we shall refer to as the unit error, stems from variation among the experi- 
mental units. The second type, which we shall call the technical error, stems from limitations 
on experimental technique, 

Generally, the unit error is to be regarded as a fixed quantity associated with any given 


М. В. Wx © - 71 


To focus attention on the basis for statistical inference which is provided by the random- 
ization procedure we shall in this paper assume that the only important source of experi- 
mental error is the unit errors. 


2. THE ANALYSIS OF VARIANCE 
(2-1) T'he conceptual underlying population b^ 

With the possible application of each of the treatments to each of the experimental units 
we associate a real (unknown) number. This defines a set of rst numbers which we take to be 
the conceptual underlying population for this experiment. Since the fact of the experi- 
mental situation is that each unit can be *used' only once, the population defined is con- 
ceptual in the sense that only a subset of rs of the numbers of interest can be observed. The 
scope of a statistical inference for this situation can be delineated by noting that the con- 
ceivable totality of experimental information would be given by applying each treatment 
to every experimental unit and observing the response. 


(2-2) The population model 

Let i = 1,2, ...,r denote the block number. 

Let j = 1,2, ..., $ denote the experimental unit number within each block, where в = pt. 

Let k = 1,2, ...,t denote the treatment number. 

Let уг represent the (conceptual) response which would be obtained if treatment Ё 
were applied to the jth unit in the ith block. Thus our underlying population is the set of 
(conceptual) unknown numbers (y;;;). 

We will employ the usual dot convention for means, for example, у „= D Val (rs). 


We now define PE AU ARE 
b; — um 
ly Yk Yano 
(М). = (Ji. 7 Yi...) — Y.. Y...) 
j = Уу. Yi.. 


Tag = (Yije 7 Yiz.) — ix 7 Yi.) 


where i = 1,2,...,739 = 1,2,...,8; k = 1,2, ...,t. 

These quantities may be given a physical interpretation. 

гів the (conceptual) overall mean yield which would be obtained if each treatment were 
applied to every unit in every block. 

b, is the difference between the (conceptual) mean yield of all treatments on all units of 
block i and x, and may be thought of as the effect attributable to the ith block. 

In an analogous way t, may be thought of as the effect attributable to the kth treatment. 
We note that by definition this is the average effect over all experimental units. 

(bt), is the difference between the effect of treatment kon blockiandt,. Thusitis a measure 
of the extent to which treatment Ё and block i interact and will be called the block-treatment 
interaction. 

e, gives the difference between the (conceptual) mean of the yields of all treatments on 
unit j of block i and the mean over the whole block. It, therefore, measures the extent to 
which the jth unit deviates from the other units of block i and will be called the unit error. 


72 Randomization analysis of a randomized block design 


The set {e,,},j = 1,2, ...,8, can be used to give a measure of the heterogeneity of units within 
the ith block. 

Ny, is the difference between the effect of treatment Ё on unit j of block i and the effect 
of treatment k over all of block i. Itis, therefore, a measure of the extent to which treatment 
k and unit j of block i interact and will be called the unit-treatment interaction. 

The following equation is an algebraic indentity: 


Yije = MEO +t + (bt); + eig + ту. 
From the definition of the quantities it follows that 


Db: = Dt = X (bt), = > (bt), = Уе = Ута = Ууу, = 0. 
i k i k j j k 


Algebraically the sets of numbers {63}, (£3, {O}, {е} and {п} are pairwise independent 
in the sense that, for example, t = 0 for all does not imply (5t); = 0for all Ё. In particular, 
if (0), = 0 for all i and k, then algebraically this does not imply that n; = 0 for alli and k. 
On the other hand, the physical situation is such that if all the block-treatment interactions 
were zero, then one would expect that all the unit-treatment interactions would be zero. 
This follows from the fact that the experimental units are blocked so as to be more homo- 
geneous within a block than from block to block. If so, then the lack of a differential effect 
from block to block would lead one to expect that the differential effect from unit to unit 
within a block would be negligible. 

The converse is, of course, not true. If the units within a block are sufficiently homo- 
geneous then the unit-treatment interactions will be small in absolute value. But this does 


The essential point here is that it may not be unrealistic to assume that the treatments 
react additively (i.e. unit-treatment interactions are zero) within a block even though they 
react non-additively from block to block. 

In many instances, whether we assume the unit-treatment interactions are zero or not, 


it may be reasonable to assume that the variability of units within a block is essentially the | 


same for all blocks. This assumption could be idealized as 
Уер = (8-1) 0°; 
j 
Eni = (8—1)о% (i=1,2, eor; k= 1,2, ...,t). 
7 


The importance of this discussion isin indicating the direction of simplifying assumptions 
in the analysis of the design. 


(2:3) The statistical model 


In actually carrying out the experiment, we will in fact observe only a (restricted) random 
sample of size rs from the set of rst numbers {Yin}. Let x47 denote the observation obtained 
from the fth replication of the kth treatment in the ith block, where û = 1,2,.5fi 
k=1,2,...,t; and f= 1,2,..., for each (ik). Thus Bins represents the observed total 


response from all units in block i to which treatment Ё has been applied. 


m 


М. B. Wink 73 
To write an explicit model for Ear we now define some additional quantities. Lett 


Df, = 1 iftreatment Ё falls on unit j of block i 
= 0 otherwise. 


Because random methods of allocation are employed, the /Ж, may be treated as random 
variables, and from the design of the experiment it is easy to specify certain of the dis- 
tributional properties of the Df. For example, 
P(D, = 1) = P (treatment k falls on unit j of block i) 
= ple. 

Hence E(D§) = р/в, 

where we follow usual convention and define E(x) to be the mathematical expectation 
of a. More detail on the Df, is given by Wilk (1953). 

It is easy to see that 


L ау = [et by + ty + (bt) а] + У еру ег E тъ Di. 


This relation we shall call the statistical model. This formulation exhibits explicitly just 
what are the random variables involved in the model, namely, the р} which take оп the 
values 0 or 1 with known probabilities. We note that it is the physical act of randomization 
in allocation of treatments to units that permits us to treat the Di; as random variables, 
and hence proyides some basis for statistical inference. 


(2:4) The analysis of variance table 
The primitive analysis of variance for this design is simply a breakdown of the sum of 
squares of deviations of individual observations from their mean into additive components 
which can be attributed to various sources. The analysis of variance has proved useful in 
the statistical analysis of experiments in the estimation of components of variation, in the 
estimation of the variance of estimates of treatment comparisons, and in making tests of 
significance. 


As before, we use the dot convention for means, e.g. ж = У) 211/8. The algebraic detail 
kf 


_ of the analysis of variance is given in Table 1. 
We can now employ the statistical model for У лу, and certain properties of the DE, 
1 


to derive the expectations under randomization of the analysis of variance mean squares. 
The detailed algebra is given by Wilk (1953). 


The results are tabulated in Table 2. 
It is of interest to note that if the n; аге not all zero, then even if t, = 0 for all k, the 


expectation of the treatment mean square is not equal to the expectation of the error mean 
square. Similarly for the interaction mean square. If the э, are small compared with the 
e; or if t is large, the bias is negligible. The assumptions 


De, = (s—1)o%, alli, 
1 
Ул = (8—1), alli and b, 
1 


do not affect the above discussion. 


+ A more formal definition of the DL, in which they appear as characteristic set functions of set- 
valued random variables, is given by Wilk (1953). 


74 Randomization analysis of a randomized block design 


Clearly, for small t, a meaningful comparison of the treatment mean square (or interaction 
mean square) with the error mean square depends heavily on the assumption that the n, 


Table 1. Analysis of variance 


Sum of squares 


are negligible. 
Degrees of 
Dasto freedom 
Blocks (r—1) 
"Treatments (2—1) 
Interactions (r—1) (t— 1) 
rt(p—1) 


rs—1 


В -sX(,—z.* 
i 

T= "PX a.u. 

I= P (tirti E 2, .)* 
ik 


R=5 (“ing — ix)? 
ikf 


G = У (taa. 
tkf 


Mean squares 


B 
Ber 
T 
= (0—1) 
І 
= (1) 001) 
Е — 
rp ^ 


* 


* 


Table 2. Expectations of mean squares under randomization 


Due to ee Expectation of mean square 
CPER" r E 
Blocks Be жЕ; > ی + راہ‎ zs + 
Treatments T* zB z e+ enun D + pe » (O) 
Interactions 1* wee 2 Det Goyer E nh 
f j Error к | D+ anh 


коес ————————————————_! 


In contrast with similar results based on normal theory, the expectation of the mean 
square for blocks does not contain all components which appear in the expectation of the 


‘error mean square. The same remark a 


pplies to a comparison of blocks mean square with 


int ion mean square when all (bt), = 0. It is apparent that an analysis of variance test 
| of significance of block effects, for the situation under consideration, cannot be justified 


by randomization. 
ai 


, 
In Table 3 we give some 
of squares, under the h 
plifying assumptions: 


(2-5) Some randomization moments 


Tag, = 0, 
Уе = (8—1)о?, 
j 


for all i, 


rd = (s—1)(2s—3)o4/s, for all i. 


moments under randomization of the analysis of variance sums 
ypothesis that £j = (bt) = 0 for all i and k, and using the sim- 


for all i, j, k, 


M. B. Wirk 175 


We use G,, Ву, Т, I; and R, to denote the values of G, B, Т, I and R, respectively, under 
the indicated conditions. 
The results of Table 3 are derived from more general expressions, given by Wilk (1953), 
which use no homogeneity assumptions regarding У ej, and X e$. The algebraic develop- 
j j 


ment is also detailed in the same paper. 


Table 3. Some randomization moments under simplifying assumptions 


E(G,) = Go = pii +s- 1) o? V(G,) = 0 

Е(В,) = Bo = им V(B,) = 0 

E(T,) = (t— 1) 0° V(T,) = :e-ye(i-7) 

тр, 

E(I,) = (r-1)(t- 1)0* V(I.) = 5-00-01) 

E(R,) = rt p—1) a? V(R,) = P= ot 
E 00 70 фт Ry = - AED 

тр wr 

ву хри) 


It is of some interest to note that for large r or p, V(T,) approaches 2(t— 1) o; for large р, 
V(t) app es 2(r—1)(t—1)o*; for large р, V(Ro) becomes independent of p and 
approaches 2r(t—1) 04; if p = 1, V(Ij) becomes (2r— 1) (t— 1) o4/r; and thence for large r 
becomes independent of r. The correspondence to and divergence from the corresponding 
moments based on normal theory will be apparent. А 


The correlation between T, and his — n БУС РЕТ which is —1 for p = 1, ES 


goes to 0 as p increases. The correlation between T, and Ry is — J I which for large p 
y * 


goes ‘ot / 5 and for large r goes to zero. The correlation between J, and Ry is 


_ -9-9 JN 
(rp-r-1) ' І : 
(r-1) "n 
which for large р goes to — I and for large r goes to — 1. PRI О 
Ы US 


3. ESTIMATION 


This section deals with the estimation of certain functions of the parameters of the popula- 
tion model. Variances of the estimates under various conditions are given, and the estima- 
tion of these variances is considered. Some additional results, as well as the algebraic detail, 


are given by Wilk (1953). 


16 Randomization analysis of a randomized block design 


The following general notation is used: 


V (ж) is the variance of æ under no assumptions. 
(о) is the variance of ж under the homogeneity assumptions that 


Ze = e a Emin = (6-10; and Хет = 0, 
1 
for each = 1,2,...,rand k = 1,2, ...,t. 


V*(x) is the variance of æ under the assumption that all the n;,, are zero. 
v denotes the analysis of variance error mean square. 


(а) An unbiased estimate of wis 2 = х: 
V(f)-— Pbis-Dn = nz a під and there appears to be no reasonable way to estimate x Wig. 


VÊ) = 0, so wis M without error if the n;,, are all zero. 
(b) An unbiased estimate of b; is b, LmEL—S 
i (r=2) 1 
үф) = rts(s— 1) Х Eq Pis(s— 1) 2 Eni». 
¥,(6,) = — -lat No reasonable estimate of either V(6,) or V,(5;) is available. 


үоф,) = o 80 the block effects are known precisely if all the ту, are zero. 
(с) An unbiased estimate of t, is й, = x p, —2, : 


V(f,) = AA = ij [«- 1) D e$ + 2(t—2) > 47+ (£— 3) Y nati у> Ж] A 


70) = m o+ 01)- os and e- =з, tends to overestimate (f). 
t- a 


Vo.) = е and v is an unbiased estimate of (Ё). 


LE s 
(d) An unbiased estimate of. "ue is UD = Xu — Xz — Uy +e: 
-y|t- 1) r—2) Ee +! 


+ 2(t— 2) (r — 2) > jij t - 


(у = = 


rs(5— 1 Ee 


2(rt—r—2t+1) 


r 
2 |: 
A —1l)(t— == — _ 
Vi (bt), = gautz 1) o4 о%)— ке оё, and so IDEEN. tends to overestimate 
Ы re rs r8 
(6) л. 
(е) An unbiased estimate of the treatment mean (+t) is (A +) = ay: 


Е s 
(A+) = = s =. ) -Elet Reij Mije + nij). 


(t—1) 
rs 


È езт, 
a J 


+(rt—3r— 2+4) X nda + 
1 


mari) = E 


(o* - 05), and 


v is an unbiased estimate of (A + £,). 


7(2 i) = E 


PE Eip 8 


na *—1) v is an unbiased estimate of 7(2 +f). 


M. B. пк 77 
(f) An unbiased estimate of the treatment contrast ры. with X p, = 0, is 
k k 


Ends = Ури. —2...): 
1 е) 1 
Vee) = |Ен тайната). 


1 1 1 
Who — p E09 (8+ on) 7 ат) BE Pema)” and so Ti LL tends to 


overestimate Де рй). 


VS prin) = = (Spi) X e3}, and this is estimated unbiasedly by A (D pi) v. 
Е iki y “TP k 


1 
rip(s — 

We close this section with a brief discussion of special cases. If r = 1, р> 1 (completely 
randomized design), the discussion of block effects and interactions is to be ignored, and 
the remaining results carry over directly. 

For the case of p = 1, r>1 (randomized block design) the situation is somewhat different 
in that the analysis of variance error mean square which we have been using becomes non- 
existent. If we can assume that all block-treatment interactions are zero, then the inter- 
action mean square in the randomized block design has expectation 

1 (t—2) 
п) н-т" 
and for most of the cases discussed above, the variances of estimates may be estimated 
unbiasedly. 
+. Kempthorne (1952 a, b) has discussed in detail the randomization analysis of the random- 
ized block and completely randomized designs. 


4, TESTS OF SIGNIFICANCE 


In this section we consider tests of significance of a number of null hypotheses (hypotheses 
of equivalence) of possible interest. The object of the test is to obtain a measure of the 
adequacy of the experiment to indicate conclusions. 

To employ a randomization test of a null hypothesis we require no assumptions about the 
form of the frequency distribution of the observations. A real-valued function of the 
observations is selected which will reflect (in a monotone increasing fashion, say) the devia- 
tion from equivalence of the treatments. 

Having obtained a set of numbers for the particular experimental arrangement, we 
associate these numbers with the experimental units from which they derived. We then 
evaluate the function selected for the actual disposition of the treatments and for all other 
dispositions possible under the restriction of the design. A measure of the strength of the 
evidence against the null hypothesis (i.e. the level of significance} of the experiment) is 
the proportion of the values of the function which exceed the ‘observed’ value. The initial 
introduction of randomization enables a probabilistic interpretation. 

In general, the amount of computation implied by the above procedure is prohibitive. 
Consequently, several writers (Welch, 1937; Pitman, 1937; Kempthorne, 1952а,Ь) have 
studied the possible approximation to the randomization test of certain procedures which 


t A high level of significance corresponds to a low proportion. 


18 Randomizgtion analysis of a randomized block design 


derive from normal theory assumptions. Since the normal theory criteria fulfil the require- 
ment for a randomization test function, the approach has been to examine the correspond- 
ence by comparing randomization means and variances of these criteria with their normal - 
theory analogues. We proceed to make a similar examination for several null hypotheses 
of possible interest in the generalized randomized block design. The normal theory analysis 
for this design has been outlined by Wilk (1953). 

In all that follows in this section we make the assumption that the у are all zero, and 
that Y е, = (s— 1)c?, Le = (s— 1) (2s—3) 048 (i = 1,2,...,7). 

j 


(a) Consider the null hypothesis that the treatments all react identically on every 
experimental unit. This implies that y;; = y;;. for all i, j, k, and hence t = (6), = ту, = 0 
for all i, j, k. 

The normal theory test of this hypothesis may be based on the criterion 


(Zo + 15)/ (To +o 4- Ro), 


which under the usual normal theory assumptions and under the present null hypothesis 
follows a Beta distribution 


f(g) = Kpn -g . (0< а <1), 


where f, = r(t—1), f = rt(p—1). 
Let U and U* denote the criterion above under normal theory and randomization respec- 
tively. Then O< U «1 and 0< U* <1. Since the denominator of U* is constant under 


randomization H(U*) = ia and | AU) = ра, Under normal theory, 4 
1—1 901—1 —1 ! 
E(U) = £, V(U) & be E dn Thus E(U*) = E(U); and for large s or small r, 4 


V(U) approaches V(U*). 

If we can judge the correspondence of the distributions of U and U* from the comparison 
of their ranges, means and variances, it would appear that the randomization test may be 
approximated by the usual test based on normal theory. 

To better the approximation to the randomization distribution of U* by a Beta dis- 
tribution, we might use a Beta variate Q, with density 


f(Q)- K1- Q} (0<Q <1), 


f (t— 1) 
such that E(Q) Th +h ете E(U*), 
Dri bs 2f, fs m" 1)(p— 1) 
Ер, "T WARS ATS- naci "О": 
; _ (t—1) (rs— 2) t(p —1)(rs—2 
Then h= GET е Љ= عد‎ а). 


For different assumptions regarding the quantities У е?, and Xd, (i = 1,2,...,7), we 
j j 


would thus arrive at different Beta distributions to approximate the randomization test. 
Welch (1937) considered such an adjustment for the case of the randomized block design 
when the blocks do not exhibit equal variation. 


M. В. Witk 79 


(b) We consider now a situation in which we can assume that (bt), = 0 for all i and b 
and wish to test the null hypothesis that tą = 0, k— 1,2, ..., t. 
The criterion suggested by normal theory for this situation is Ты(Т,+ I+ Ro). Let W 
denote this quantity under normal theory, and W* denote it under randomization. 
Then under the null hypothesis, O< W « 1, 0< W*<1, and W is a Beta variate with 
2(t—1) (rs —r—t—1); (0—1) 


_ 0-1) ' 
E(W) = ent) and V(W) = эйе кей) while E(W*) = es pu 


„_ 2-1) 1 

vorm - decipi) 

Thus E(W) — E(W*), and for large values of p or r V(W) and V(W*) are approximately 
equal. 

Of course as in (a) we can define a Beta variate which will have the same mean and vari- 
ance as W* and, perhaps, in this way better the approximation. 

А special case of interest is that in which p = 1, r» 1, i.e. the randomized block design. 
In that case, Ё does not exist, and carrying out à meaningful analysis of variance test of 
significance for that design hinges upon the assumption that the block-treatment inter- 
actions are zero. 

Also, of particular concern here is the case ofr = 1, p» 1, i.e. the completely randomized 
design. In that case, / does not exist, and the test we have discussed is the one of interest 
for that design. 


It is a pleasure to acknowledge the assistance of Prof. O. Kempthorne in the preparation 
á of this paper. 
P 


REFERENCES 


FISHER, R. A. (1926). The arrangement of field experiments. J. Minist. Agric. 33, 503-13. 

FISHER, К. A. (1935). The Design of Experiments. Edinburgh: Oliver and Boyd. 

KEMPTHORNE, O. (1952a). The Design and Analysis of Experiments. New York: John Wiley and Sons. 

KEMPTHORNE, О. (19525). The randomization theory of experimental inference. Paper delivered to the 
Amer: Stat. Ass. and Biometrics Soc. Publication pending. 

Prrman, E. J. С. (1937). Significance tests which may be applied to samples from any population. 
III. The analysis of variance test. Biometrika, 29, 322-35. 

We on, В. L. (1937). On the z-test in randomized blocks and latin squares. Biometrika, 29, 21—52. 

Wink, M. B. (1953). The randomization analysis of a generalized randomized block design. M.S. 
thesis, Department of Statistics, Towa State College. 


[ 80 ] 


SOME QUICK SIGN TESTS FOR TREND IN LOCATION 
^ AND DISPERSION 


Bv D. R. COX 
Statistical Laboratory, University of Cambridge 
AND A. STUART 
Division of Research T'echniques, London School of Economics 


l. INTRODUCTION AND SUMMARY 


Many distribution-free tests have been devised to test the hypothesis of randomness of 
a series of N observations, i.e. the hypothesis that N independent random variables have 
the same continuous distribution function. Of these, the rank correlation tests are the most 
efficient tests against normal trend alternatives, but others are of some use in situations 
where speed and simplicity of computation are important. 

In this paper, we discuss a class of simple sign tests, considered first as tests against trend 
inlocation. Optimum tests are found from the standpoint of asymptotic relative efficiency 
(a measure of local power in large samples), and it appears that the best of these tests may 
be preferred to the other simple tests considered in the literature, although they are, of 
course, less efficient than the rank correlation tests. 

Similar tests are available for trend in dispersion, and the efficiency of these, in the normal 
situation, is investigated and compared with the test based on the maximum-likelihood 
estimator. Finally, we add a few remarks on sequential sign tests. 

Readers not interested in the theory should look at $$ 10, 11 and 14, where there are brief 
statements of the tests and numerical examples. 


2. THE SIGN TESTS FOR TREND IN LOCATION 


We consider a series of N independent observations from a standardized normal regression 
model with an upward trend, i.e. 


H: у= a+Ai+e, (i = 1,2,..., N), 


where А > 0 and the є, are independent standardized normal variates. We wish to test the 
null hypothesis 
Hs A20, 


using a distribution-free test statistic, so that our test will remain valid whatever the con- 
tinuous distribution of the c terms in the model, although naturally its efficiency will vary 
with the form of the distribution. 

The most efficient known distribution-free tests of Н, are those based on the rank corre- 
lation coefficients (Stuart, 1954), but our object here is specifically to find tests which are 
quick and simple to compute. We define for i < j the score 


+1 if Yi > Yj 
hy = { 
0 if у<у,„ 


D. В. Cox AND A. STUART 81 


h; is thus based on a comparison of the ith and jth in the series of observations. The dis- 
tribution of the observations will be assumed continuous so that the possibility of ties can 
be ignored (see, however, § 16). 

We confine ourselves throughout to comparisons of independent pairs of observations, 
i.e. no observation is compared with more than one other observation. (This is in contrast 
to the procedure used in calculating the rank correlation coefficients, where every observa- 
tion is compared with every other observation in the series.) Since there are N observations, 
there can be no more than 4N such independent comparisons. We now assume N to be even 
and always take i<j. Our problem is to find the set of comparisons and the appropriate 
weights w; which will make the statistic 


S = Жш, (1) 


as efficient a test of Я, as possible. The summation in (1) contains }N terms, no suffix being 
repeated. АП tests of the form (1) are distribution-free, since on the null hypothesis any 
h,; is a 0—1 variate with probabilities ($, 3), whatever the distribution of the e;. 


3. ASYMPTOTIC RELATIVE EFFICIENCY 


We shall use as our criterion of efficiency the asymptotic relative efficiency (A.R.E.) of a test. 
If there are two consistent tests, s and t, of a hypothesis Hy: А = 0, the A.R.E. is the reciprocal 
of the ratio of sample sizes required to attain the same power against the same alternative 
hypothesis H,, taking the limit as the sample size N tends to infinity and as H, tends to Hy. 
(This second limiting process is necessary to keep the power of consistent tests bounded 
away from 1.) Pitman (1948) and Noether (1955) have shown that, if s and t both have 
normal limiting distributions on Н, and H,, the A.R.E. of s compared to t is given by 


A.R.E. (5,0) = aim (Fa) (2) 
where RX) = | [a E(X|A кыр [rx |A=0), (3) 


provided that r satisfies the equations 
lim R(s) N77 = R, lim R(t)N- = R; (4) 
N>o No 


Here E and D? denote mean and variance as usual, while R, and R, are constants inde- 
pendent of N. The interpretation of the A.R.E. is discussed critically in $9 below. 


4. THE BEST SIGN TEST 


E(h;j) = prob (y; > y;), 
and as (у; — y;) is a normal variate with mean (j —i) A and variance 2 this is 


Since h,; is а 0-1 variate, 


E(hy) = e[-9 E ; (5) 
where bo oM 
G(x) = ү NUM i dt. 
Now ð 1 
EUER (6) 
6 Biom. 42 


82 Some quick sign tests for trend in location and dispersion 
so that, by (5) and (6), 


ЕФ) = [в], .- E. (1) 

We now write (j—i) = rjj. Using (7) in (1), we xen 
E'(S) = Sug B'h) = و‎ Wi (8) 
ا‎ рщ8|А=о) = Хи V (hy | Hy) = Euh. (9) 
Equations (3), (8) and (9) give Geo (ur? afl 


and we now wish to maximize (10) to obtain the highest possible A.R.E. We do this in two 
stages. First we maximize (10) with respect to the w,;, regarding the r;; as fixed, and then 
we choose the supremum of these maxima for variations in the r;;. 

To maximize (10) for fixed,, and variation in the w;;, we must maximize Lw,,7,; subject 
to Жи, being held constant, i.e. we must unconditionally maximize 


F oLmXwgr—-AXw. 


It is clear from the conditions of the problem that each w;; will be a function of the corre- 
sponding 7;;, so that on differentiating F for a stationary value we get 


or 
Tig Ug Б : -2Àw;, = 0, 
ij 
i.e. ty Ф _ 24. 
"A е wy дш 
This is satisfied by Wig = A. (11) 


so that the required set of weights are proportional to the distances apart of the observations 
compared. The stationary value is a maximum. Substituting (11) into (10), we have 
' 


RAS) = 108, (12) - 


This is the maximum value of R*(S) for a fixed set of ту. The гу, are a set of $N differences 
between pairs of integers chosen from the integers 1,2, ..., N. It is easily seen that £r% is 
largest when the pairs are (1, N), (2, N — 1), (3, N — 2) and so on. In general 


ту = (N-k+1)—k =N-2k+1 (k=1,2,...,4N) (13) 

so that Ir} = ў N+P = 4N(N?-1), (14) 
and the supremum value of (12) is therefore 

Rg, =) (15) 


We have denoted by 5, the optimum 5 statistic 


iN 
S= SNR 1) hy, ул» 


D. В. Cox AND A. STUART 83 


for which 
E(8,|A 20) = 43Z(N — 2k +1) = №, l 


DXS,|A-0) = JE(N - 2k 4-1? = A&N(N?-1).] 


The. test based on S, is essentially a simplified version of Spearman’s rank correlation 
test, which is in effect defined by 


(16) 


y = XG-i)h; (17) 
i<j 


where the summation extends over all possible 1N(N — 1) pairs of observations. Stuart 
(1954) has shown that үз 
RV) ~i (18) 
so that using (15) and (18) in (2) and (4) with r = 3, we obtain for the A.R.E. of 8, compared 
о ARE. (Sp V) = (B = 087. (19) 
The loss of A.R.E. involved in reducing the number of comparisons from 3N(N — 1) to iN 
is as little as 1395. T: " 
These values of the A.R.E. depend on the assumption of normality, but the calculation 


of the form of the optimum statistic, S,, and also of the statistic, Sj, of $5, does not. For (7) 
remains true, with a changed numerical factor, for general continuous distributions. 


5. AN UNWEIGHTED SIGN TEST 


The relatively high efficiency of S, compared to V leads us to construct, by analogy, à 
simplified version of Kendall's rank correlation test, which may be defined by 


Q= У №, (20) 

i<j 
and gives equal weight to all }N(N – 1) comparisons. Q has the same A.R.E. as V (Stuart, 
1954). The analogous sign test, based on &N equally weighted independent comparisons, is 
S, = This, (21) 

and using (10), we obtain, with all w;; = 1, 

2 
R*(Sa) = Na (Ery). (22) 
We now require to choose 4N pairs from the first N integers so that (22) or, equivalently, 


X(j—i) = Er; takes its maximum value. This occurs whenever every + is chosen from the 
first 4N integers and every j from the last $N integers. In particular, it occurs when every 


т = $N exactly, so that iN ‚ 
S= 5 Tu, we (23) 
and (22) becomes N3 
R4S,) = i. (24 
Using (24) and (18), we obtain 
A.R.E. (Sp, V) = (3) = 0:79, (25) 
while from (15) A.R.E. (Sp, Sy) = ($)? = 0:91. (26) 


6-2 


84 Some quick sign tests for trend in location and dispersion 


Thus the simplified version of Kendall’s rank correlation test is 21% less efficient than 
Qor V, and 9% less efficient than the simplified Spearman coefficient S,. The use of S, is 
equivalent to a test considered by Theil (1950). 


6. THE BEST UNWEIGHTED SIGN TEST 


However, we can improve on the efficiency of S,, and in fact get very nearly as high an 
efficiency as that of S,, by ‘throwing away’ some of the 4N comparisons and retaining equal 
weights for the others. This was suggested by one of the present authors in the discussion 
of Foster & Stuart (1954); it leads to an increase in efficiency because, by comparing 
observations further apart, individual comparisons are made more sensitive. 

In (1), let every w; be either 0 or 1, and let m (< N) be the number of non-zero w. 
For our new statistic S, we have, as in (8), 


B'(S,) =— єр нити (шу = 0 or 1), (27) 
and from (9) DWS|A-0)- jm, * (28) 
во that (3), (27) and (28) give 

R$) = E (Умут) (wy = 0 or 1). (29) 


To maximize this by choice of m and r;;, we again work in two stages. For fixed m, (29) will 
take its largest value when the comparisons given zero weights are based on the middle 
(N — 2m) observations, while every ? is chosen from the first m observations and every j is 
chosen from the last m observations. In particular, this will be so when every r; = (№ —m) 
exactly, so that 


5 = X se (30) 
and (29) becomes © RYS) = mN =m)? (31) 


(31) is the largest possible value of R*(S,) for fixed m. (S, is the special case of S, when 
т = $N.) We now choose m to maximize (31). Differentiating, we get 

m= iN (32) 
for a maximum, so that finally iN 
S = PL N+ 


for which Е(8,) = ÀN, 
whic) (5) = + | (35) 
V(Sy) = ÈN, 
and from (81), 4N3 
R(&) = gz. (96 
From (34) and (18), we have 
A.R.E. (Sg, V) = (19)* = 0-84, (35) 
while from (15) A.R.E. (Sq, 51) = ($)* = 0-96. (36) 


Compared with either V or S,, S; has about 5% higher efficiency than S,, and in fact its 
efficiency is 96 % of that of S,, so that for practical purposes it may be recommended instead 
of 8, because it requires no weighting of the comparisons. 


D. В. Cox AND A. STUART 85 


7. COMPARISON OF THE SIGN TESTS 
In Table 1, the A.R.E. of the sign tests are tabulated, compared to each other, to the rank 
correlation tests, and to the best (parametric) test against normal regression, based on the 
sample regression coefficient b, which has a value of (3) given by 
N3 

Rb), (37) 
as follows immediately from the fact that b is an unbiased estimator of А with variance 
12/(N(N? — 1). 


Table 1. Asymptotic relative efficiencies of sign tests 


Asymptotic relative efficiency 


Test statistic 


From (2), (18) and (37), it follows that the A.R.E. of either rank correlation coefficient 


compared to b is 3 
A.R.E. (V, 5) = 8 = 0-98, (38) 


and not 3/7 = 0-95 as given by Stuart (1954). 


8. CoMPARISON WITH A.R.E. OF OTHER TESTS 


Apart from the two rank correlation tests already discussed, Stuart (1954) investigated 
the A.R.E. of three other distribution-free tests for trend in location. Two of these, the rank 
serial correlation test and the turning point test, were found to have zero values of R as 
defined by (3); the third, the difference-sign test, was found to have a value of r equal to 1 
in (4), as against r = 3 for all the tests considered in this paper. It followed that the three 
tests mentioned all have A.R.E. zero compared to the rank correlation tests (and hence to 
all the tests discussed here). Noether (1955) gives general results which rigorize these 
conclusions. 

A well-known and simple test which has not, as far as we know, previously been diseussed 
from the point of view of A.R.E. is the median test, due to Brown & Mood (1951). The N 
(even) observations are divided into two sets of }N consecutive observations. The test 


86 Some quick sign tests for trend in location and dispersion 


statistic is simply the number of observations in the first set which exceed the sample 
median y,,, and it is therefore defined by 


iN 
B= X b. (39) 
=1 


| Jii Yi ? Ym 
where да = l 
E E GR 
The A.R.E. of B is easily obtained. We know that у; is a normal variate with mean (« +1) 
and unit variance. It follows that the sample median y,, is asymptotically a normal variate 
with mean («+ 3(N + 1) A) and variance of order N71. Since y; and yp are asymptotically 
independent, (y; = Ym) is asymptotically normal with mean A[i—4(N + 1)] and unit variance, 
so that for i<4(N +1) 

E(bim) = prob (y; > Ym) ~ 1 ФАШ + 1) -i]j. (40) 


Using (6) in (40), we obtain 


| Е'бы)= — gg +1)—4 (41) 
so that from (39) and (41), 
iN Ng 
EB) = X E s) =a Ў ХОГ) pL. ы 
Also (Brown & Mood, 1951) D*(B| A-0)e T. (43): 
(42) and (43) give, in (3), N3 
RNB). (44) 


Comparison of (44) with (24) shows that B has precisely the same A.R.E. as S, and is therefore 
slightly less efficient than S, and S,. If the observations are available in serial order, S; is 
simpler to compute than B, which involves ranking all the observations to find the median, 
and then making 4N comparisons, as against 4N for Sg. There is therefore no reason to prefer 
B to S, in this case. If, however, the data were available graphically, B would be consider- 
ably easier to compute, and this would outweigh the slight loss of efficiency compared to Sy. 


9. COMPARISON OF THE POWERS OF TESTS 


So far we have compared tests by the A.R.E. in the usual way. Before considering the power 
of the test S, in small samples it is convenient to examine the meaning of the A.R.E. more 
carefully. If the A.R.B. of a quick test relative to an efficient test is A, then asymptotically 
A-t as many observations have to be made for the quick test to give the same local power 
as the efficient test. This is directly relevant if in designing an experiment a choice has to 
be made between, on the one hand, using an efficient method of analysis and on the other 
taking more observations and using a quick method of analysis. But it is not directly rele- 
vant to the choice of a method of analysis for a given body of data, because it depends in 
part on 7, defined by (4), measuring the rate at which power increases with increasing N. 
For a given problem r is fixed and so the л.в.к. can be reinterpreted in terms of the power 
attained at a fixed sample size, but it seems preferable to compare tests directly in terms 
of power. 


D. В. Cox AND A. STUART 87 


Consider a test based on a statistic S normally distributed with mean Е(8 |А) and 
standard deviation D(S | А), where the null hypothesis is A = 0. The null hypothesis is 
rejected at the significance level a if 


S» E(S | 0) - A,D(S | 0), (45) 
where G(—A,) = a. (46) 
The power of the test is G[ p(A)], where 
_ E(S| A) - E(S|0)- A,D(S|0) 


ил) DRIA (47) 
Now no _ PA) _ E'(S|0)- A, D'(S | 0) 
ро) = (80), oT DSTO (69) 
In all the applications in this paper D'(S | 0) = 0, so that 
' E'(S | 0) 
ro = тво) = 9) (49) 
Near A = 0, p(A) = AR(S) — A, 4 O(A*), (50) 


and in applications the first two terms give, asymptotically in N, the whole of the power 
curve. Moreover, R(S)~ RN” as N оо and comparable tests of a given hypothesis will 
have the same r; hence we usually need to consider just R. We call p(A) the power deviate 
and p'(0) the power derivative. Asymptotically the graph of p(A) against A is linear, and tests 
at different significance levels are given by parallel straight lines. Or to put the same fact 
another way, the power curves are asymptotically linear when plotted on arithmetical 
probability paper. 

Now consider the small sample theory with S possibly not normally distributed. Then if 
the power curves are plotted on probability paper they can be expected to form an approxi- 
mately parallel set of curves approaching a set of parallel lines as the sample size increases 
and the distribution of S tends to normality. This is of course only a method of presenting 
the results of power calculations, but we shall find it very convenient both in assessing the 
small-sample behaviour and in comparing different tests. 

Consider now two tests for which the asymptotic values of R(S) are R,, Ry. Then asymp- 
totically in N the power curves for a given a are two lines on probability paper, the ratio of 
their slopes being R,/R, independent of о; both lines intersect the probability axis at о. 

A first consequence is that there is no simple general relation between the difference in 
the power of the two tests and the ratio R,[Ry. КЕ + Е, we can, by taking а sufficiently 
small, make the difference in power between the two tests arbitrarily near unity. In practice 
we are probably only interested in 0-20 > a> 0-001, but the general conclusion remains that 
the difference in power between a quick test and an efficient test will be greatest for small о. 
Table 2 expresses this quantitatively; it shows for given R,/R, the powers of the two tests 
at the point at which the difference in powers is greatest. The values in Table 2 are in- 
dependent of N, but the values of A at which these powers are attained do depend on N. 
This is the restriction on the alternative hypothesis referred to in § 3. Thus if АЕ = 0-7 
and а = 0-05, the difference in power is greatest for the value of A at which the power of 
the efficient test is 77 % and of the quick test 51%. 


88 Some quick sign tests for trend in location and dispersion 


Now consider the power of S, in small samples. Two methods can be used. The first is 
to take the expansion (50) to higher powers of A and to introduce a correction for the non- 
normality of S based on an Edgeworth expansion. This may be shown to give good results 
even for very small N, and is a general method which could be used where direct numerical 
calculation is difficult. However, for S, it is much easier to calculate the power directly from 
the National Bureau of Standards tables of the binomial distribution (1950). 


Table 2. Asymptotic theory. Powers (per cent) of quick and efficient tests 
at points at which difference in power is greatest* 


P 0-10 0-05 0-01 0-001 
RJR, SS 

0-9 67, 78 63, 71 49, 60 54, 67 

0-8 61, 14 56, 72 49, 71 43, 72 

0-7 59, 80 51, 77 42, 77 39, 83 

0-6 54, 84 47, 84 39, 86 29, 87 

0-5 48, 88 41, 89 30, 90 20, 93 

0-3 35, 96 27, 96 14, 97 7,99 @ 


The power was computed in this way for N = 15 (15) 135, the significance level being the 
largest value < 0-05. Under the null hypothesis the test statistic is distributed as (1-- 1)" — 
and under the alternative hypothesis as (p 4-q)**, where 


_ узул 
p= 8-28). (51) 


The power corresponding to given p, 4N can be read off directly from the tables and (51) 
solved for A. The results are given in Table 3. For comparative purposes the exact power of 
the parametric test based on the regression coefficient, b, has been computed for the same 
values of N and A. When the standard deviation about the regression line is known, the 
power is exactly G[p(A)], where 


P(A) = {7g (N?—- 1) A - A,. (52) 


To avoid rewriting the values of A in Table 4 the rows of both tables have been lettered, and 
each entry in Table 4 relates to the value of A shown above the corresponding entry in 
Table 3. 

Asymptotically, theratio of the  valuesof the two tests is, by (34) and (37), 4/(3 л) = 0:15; 
the interpretation of this in terms of power can be obtained from Table 2. The full curve in 
Fig. 1 and the full curves in Fig. 2 for k = 0 show the power curves for N = 15, 30, and the 
dotted lines are the corresponding asymptotic power curves. The small-sample power is 
lower than the value given by the asymptotic theory, the difference being quite appreciable 
in the region of 80-90% power. The power curves of the most efficient test are exactly linear 
and differ from their asymptotic form only because of the very small difference betwee? 
(N(N? — 1)j and Ni. Hence the test S, is less efficient relative to b than the asymptotic 

* These values were obtained graphically by drawing on probability paper lines whose ratio of 
slopes is R,/R, and reading off the maximum difference in probability between them. The differences 


in power are determined accurately, but it is rather difficult to find the precise point of maximum 
difference. The values in Table 2 involve R,, R, only through their ratio R,/R,. 


D. В. Cox AND A. STUART 


Table 3. Exact power of S, test against normal regression alternatives 
Values of the standardized regression coefficient, A, are given in parentheses, and the corresponding 


| 
l 
| power appears immediately below. 


0-018 


0-021 


Sample 
| size (N)... 
Significance 
level a ... 
a (0-0035) 
0-035 
b (0-0178) 
0-050 
с (0-0358) 
0:078 
d * (0-0545) 
0-116 
e (0-0742) 
0-168 
f (0-0954) 
0-237 
g (0-1190) 
0-328 
h (0-1466) 
0-444 
i (0-1812) 
0-590 
(0.2326) 
0-774 


(0-0018) 
0-013 
(0-0089) 
0-023 
(0:0179) 
0-046 
(0-0272) 
0-086 
(0-0371) 
0-149 
(0-0477) 
0-244 
(0-0595) 
0-376 
(0-0733) 
0-544 
(0-0906) 
0-736 
(0-1163) 
0-914 


(0-0012) 
0-021 
(0-0059) 
0-042 
(0-0119) 
0-091 
(0-0182) 
0-173 
(0-0247) 
0-297 
(0-0318) 
0-461 
(0-0397) 
0-648 
(0-0489) 
0-823 
(0-0604) 
0-944 
(0-0775) 
0-995 


(0-0009) 
0-026 
(0-0044) 
0-055 
(0-0090) 
0-126 
(0:0136) 
0-245 
(0:0185) 
0-416 
(0-0238) 
0-617 


(0-0298) | 


0-804 
(0-0366) 
0-933 
(0:0453) 
0-989 
(0-0582) 
1-000 


(0-0007) 
0-027 
(0-0036) 
0-064 
(0-0072) 
0-154 
(0-0109) 
0-306 
(0-0148) 
0-512 
(0-0191) 
0:727 


(0-0293) 


0-975 
(0-0362) 

0-998 
(0-0465) 

1-000 


(0-0006) 
0-062 
(0-0030) 
0-135 
(0-0060) 
0-291 
(0-0091) 
0-508 
(0-0124) 
0-730 
(0-0159) 
0-894 
(0-0198) 
0-974 
(0-0244) 
0-997 
(0-0302) 
1-000 
(0-0388) 
1-000 


(0-0005) 
0-057 
(0-0025) 
0-134 
(0-0051) 
0-306 
(0-0078) 
0-542 
(0-0106) 
0-773 
(0-0136) 
0-924 
(0-0170) 
0-986 
(0-0209) 
0-999 
(0-0259) 
1-000 
(0-0332) 
1-000 


(0-0004) 
0-053 
(0-0022) 
0-133 
(0-0045) 
0-317 
(0-0068) 
0-572 
(0-0093) 
0-807 
(0-0119) 
0-946 
(0-0149) 
0-992 
(0-0183) 
1-000 
(0:0227) 
1:000 
(0-0291) 
1-000 


Table 4. Exact power of b test against normal regression alternatives 
Values of A are given in parentheses above the corresponding entry in Table 3. 


(0-0004) 
0-048 
(9-0020) 
0-130 
(0-0040) 
0-327 
(0-0060) 
0-598 
(0-0082) 
0-836 
(0-0106) 
0-961 
(0-0132) 
0-996 
(0-0163) 
1-000 
(0-0201) 
1-000 
(0-0258) 
1-000 


Чаш 15 30 45 60 15 90 105 120 135 
size (N)... 
eine 0-031 | ооп | 0-018 | 0-021 | 0022 | 0-049 | 0-045 | 0-040 | 0-036 
evela ... 
a 0:036 | 0-013 | 0-023 | 0-027 | 0030 | 0-066 | 0-062 | 0057 | 0-054 
b 0.059 | 0.030 | 0056 | 0:074 | 0088 | 0179 | 0-182 | 0183 | 0-188 
c 0-103 | 0-074 | 0143 | 0-201 | 0249 | 0-429 | 0-457 | 0481 | 0-502 
d 0.171 | 0-157 | 0300 | 0-416 | 0.509 | 0-722 | 0-764 | 0-799 | 0827 
e 0-267 | 0:294 | 0519 | 0073 | 0-775 | 0-919 | 0-945 | 0962 | 0-978 
f 0-395 | 0-485 | 0-746 | 0-877 | 0-041 | 0-988 | 0-904 | 0-997 | 0-999 
g 0.551 | 0-699 | 0911 | 0:975 | 0993 | 0-999 | 1-000 | 1000 | 1-000 
h 0-772 | 0-880 | 0-984 | 0-998 | 1-000 | 1-000 | 1-000 | 1000 | 1:000 
i 0:879 | 0-977 | 0-999 | 1-000 | 1-000 | 1-000 | 1-000 | 1-000 | 1:000 
j 0979 | 0.999 | 1000 | 1-000 | 1000 | 1000 | 1000 | 1000 | 1000 
! і 


90 Some quick sign tests for trend in location and dispersion 


theory suggests. The difference is greater for smaller ог. The corresponding graphs to Figs. 1 
and 2 for higher N show that for æ in the range 0۰01-0۰05, the asymptotic theory applies 
well for N z 60. 

The next thing is to investigate whether the form of S, involving the rejection of the 
middle third of the set of observations, can profitably be modified in small samples. Suppose 
that (4N — 2k) observations are rejected so that the number of comparisons is ($N +h); 
the exact power function can be worked out from the binomial tables as before, but an 
immediate comparison is not possible because the significance levels for different values of 
о cannot be made equal, except by the artificial device of randomized tests. However, if 
the curves are plotted on probability paper they are almost parallel for different « and an 


99 99 
95 95 
90 90 
80 80 
60 60 
e 50 
© = 
40 40 
$ ЕЕ 
w im 
10 = 10) 
5 5 
1 1 
01 01 
0 006 012 018 024 0 003 006 0:09 012 
А А 
Fig. 1. Power of S, for N —15. Exact Fig. 2. Power of S, for N —30. Exact | 
power. ———-— Value from asymptotic theory. power. ———- Value from asymptotic theory. 


Full curves are in descending order k= 0, 1, 2, 
0, 1, 2, the second set having lower values of 
a than the first. Broken curve is k=0. 


increase in power with change in k would be shown by decreasing curvature. As would be 
expected, a negative k leads to a loss of power. Fig. 2 shows for N = 30 the curves for - 
k = 0,1, 2. There is a tendency for the curvature to increase as x decreases, but there does 
not seem to be any systematic change with k. Therefore, although an increase in Ё increases 
the number of available significance levels, it does not appreciably increase power. Hence 
there seems to be little value in modifying the 3 rule in small samples; similar calculations 
for N — 15 confirm this. 


We have not made the corresponding investigations for S,. 


10. EXAMPLES OF USE OF THE SIGN TESTS AGAINST TREND IN LOCATION 


To illustrate the S, and 5, tests, we use the figures of annual rainfall at Oxford for the years 
1858-1952, quoted by Foster & Stuart (1954, Table 9). 

For S, we compare the kth observation with the (N — k + 1)th, scoring 1 when the former 
is the larger and 0 when it is the smaller. The unit scores are then weighted by the distance 


D. В. Cox AND A. STUART 91 


apart of the observations compared, і.е. by (N — 2k + 1). In this case N = 95 and is odd, so 
that we must ignore the middle observation and proceed with N = 94. The unit scores are 
those with weights as follows: 
89, 83, 79, 77, 75, 71, 65, 59, 57, 55, 51, 49, 47, 45, 33, 31, 27, 15, 3. 
The value of S, is the sum of these weights, 1011. From (16), with N = 94, we have 

E(S) = 1104-5, D*(S,)- 34603-75, D(S,) = 186-0. 
The observed value of S, thus represents a deviation from expectation of almost exactly 
one-half its standard error and is therefore in good agreement with the null hypothesis of 
zero trend. 

If, alternatively, we were to use the simpler ıS, test we compare the kth observation with 
the (£N +k)th, scoring 1 or 0 as before, but no weighting is necessary. Since № = 95 and is 
not a multiple of three, we retain the extra observations (in accordance with our findings 
at the end of $9 above) and compare each of the first thirty-two observations with the 
corresponding observation in the last thirty-two. For our sequence of scores we obtain 


10101001100000000111111110001009 


the total score S, being 14. This clearly agrees well with the expected value of } x 32 = 16. 
The standard error of $, is, from (33), y(t х 32) = 2-83, so that the deviation from expecta- 
tion, corrected for continuity, is just over one-half of a standard error. 


11. SIGN TESTS FOR TREND IN DISPERSION 
We consider now tests for a trend not in location but in the dispersion about a fixed location. 
For example, in a regression problem we may want to test quickly whether the scatter 
about the regression curve increases as the independent variable increases. 

Divide the series 2,, ..., zy into sets Ty, ..., 24; 2а» c Paki een rejecting a few observations 
in the centre of the original series if N is not exactly divisible by k. The best choice of / is 
discussed below. For each set of k observations find the range, w, thus getting a series of 
ranges Wy, ..., W, Where r is the integral part of N/k. The ranges are then tested for trend 
by one or other of the tests S, and Sg. 

1 the null hypothesis is that z;,...,zy are independently distributed with constant 
dispersion about a regression line, Wj, ..., W, are independent and identically distributed. 
' Tf the regression is not linear the w's will be approximately identically distributed unless 
the trend within sets of observations varies appreciably. i 

Tn the next section, a valid test is obtained for any k, and the best value of k for detecting 
certain special forms of trend is found for large samples. The behaviour for small samples 
has not been investigated. The following provisional rules are suggested: 


if N290 take k=5, 
if 90>N>64 take К=4, 
if 64>N>48 take k=3, 
if 48>N take k=2. 


Except when N is very large it is probably advisable to use the weighted, rather than the 
unweighted, sign test. 


92 Some quick sign tests for trend in location and dispersion 


12. CHOICE OF b FOR DISPERSION TESTS 


To investigate the theory of the test for trend in dispersion we take a special form for the 

null and alternative hypotheses. Suppose that x}, ...,zy are independently normally dis- 
tributed with constant mean and with standard deviations (1), ..., o (N), where o(n) varies 
at most slowly with n. Then the ranges w, ...,w, defined in $11 have very nearly the dis- 
tribution of ranges of Ё observations drawn from normal populations of standard deviations 
тү = о({Ё), съ = o(3k), .... Patnaik (1950) has shown that a range of k observations can be 
represented to a close approximation as a multiple of a y-variate with suitably chosen 
degrees of freedom, v,. Therefore w10/(wj01) is approximately an F variate with (ру, vy) 


degrees of freedom. 
Hence " (de py) gil 
prob (w;/w; 1) = | Д [Г у] PDT ray ayi dx 
1 (AV 
«sas -1} 4, 
where , Tiv) 


Api کے‎ PE 
e= (riy P 
provided that (о/о) – 1 is small. 
If we assume that the trend in standard deviation is such that a(n) = oe "r~ o (l1 +70), 
where yN <1, so that y is the fractional increase in standard deviation per observation, 


E ee (o;]0;)*—1~ 2ky(i-j) 
апа prob (w;[w; > 1) =} + 2ky(i—j) e 


Consider first the application to the ranges of the unweighted sign test, Sg. From the 
r=N/k ranges we make approximately 2/6 independent comparisons in each of which 
1—j s $r. Therefore if S is the total score, its mean and standard deviation are given by 


1N 2rA, 8Ї?А 
ES|neg ay „== DRY, 


NM 
(8 |5) = (б) +00), 
Therefore the power derivative, p;(0), of the test is 
_ E(S|0)_8V3NIA, 


p3(0) = 10810) АШУ. E (53) 
An exactly analogous calculation for the weighted sign test S, gives 
А 2 J6 N!A 
pi) s =e, (54) 


Thus in both cases the asymptotically best value of Ё is the one that 
From Patnaik's table of v, the values in Table 5 have been computed. 

Now the number of ranges is N /k and the number of comparisons is one-half or one-third 
of this and is therefore small even when N is, by usual standards, quite large. Therefore it 
is advisable to use smaller values of ¢ than the theoretical optimum in large samples. In 
the absence of an investigation of the small-sample properties of the test, the rule of $11 


maximizes A,,/,/k. 


D. В. Cox AND A. STUART 98 


Table 5. Determination of efficiencies of different set sizes, k, for testing trend in dispersion 


| 
k | АУЕ k | Arik 
2 0-112 6 | 0-167 
3 0-141 7 0-169 
4 0-158 8 0-170 
5 0-164 9 | 0-169 


is suggested. This is based on the considerations that there is little gain in taking k>5 
and that it is advisable, whenever possible, to have at least sixteen ranges. 
If we substitute, in (53) and (54), the value A,/yk~ 0-16, we have 


р{(0) = ei 
р1(0) = 0-261. 


Tt remains to compare (55) with the corresponding quantity for the maximum-likelihood 
test of the corresponding parametric hypotheses. 


(55) 


13. A.R.E. OF DISPERSION TESTS 


For simplicity assume that 71, ..., ху are normally and independently distributed with zero 
mean and that the standard deviation of z, is oe”, where yN is small. The log likelihood is 


N 
= — 4 log 27) овоз у B= ga Ether. 


If we differentiate and take expectations, retaining only the terms independent of y, and 
letting N tend to infinity, we get 
д°Г; 2N eL ) N? (=) А 
vd PR ss 0, Bl) – 803. 56 
(2 сё a (a бу” ду? (50) 
The large-sample variance of 4, the maximum-likelihood estimate of y, is given by inverting 
the Hessian matrix with elements (56). We get when 7 is small and N is large 
V(9) ~ 6/22. (57) 


Thus the power derivative of the test based on the maximum-likelihood estimate is 
1 
р(0) = 5 = 0:408N1, (58) 


(58) still applies if the mean in unknown or if а linear trend in mean has to be estimated. 
From the formulae (55) and (58), and the fact that p/(0) = R(x), it follows, on using (2) 
with r = 3, that the A.R.E.'s of the tests Sy, S, compared with the maximum-likelihood 
test are about 71 and 74% respectively. 
A test entirely analogous to the above tests can be found by caleulating the variances 
within each set of k instead of the range. This is slightly more efficient in the parametric 
case but much of the simplicity of the test is lost and the increase in power may be shown 


to be trivial. 


94 Some quick sign tests for trend in location and dispersion 


14. EXAMPLES OF THE USE OF SIGN TESTS AGAINST TREND IN DISPERSION 


We again use for illustrative purposes the rainfall data quoted by Foster & Stuart (1954, 
Table 9). Using the provisional rule given in §11 above, we take ranges of sets of five ob- 
servations. Since N = 95, this gives us exactly nineteen sets, no rejection of observations 
being necessary. The nineteen ranges are: 


9-64, 12-30, 12-01, 11-45, 5:43, 13:05, 9-86, 10-89, 6:95, 15:03, 
11-34, 6-63, 12:19, 8:55, 4:80, 11:00, 7-76, 7-03, 10:98. 


If we apply the test S, we drop the middle value and take the signs of 10۰98-9-64, 
7:03-12-30, ... down to 11۰34-6-95, thus obtaining 


score: 0 1 1 1 1 1 0 1 0 
weight: 17 15 13 11 9 7 5 3 1 


The total score is therefore 58, and from (16) with N = 18 the mean score is 40-5 and the 
variance is 242-25, so that the standard error is 15-6. The deviation from expectation is 
about 1-12 standard errors, and so the two-sided normal significance level is about 27%. 
The exact significance level is, by enumeration, 7 3/256 > 283 95. 

If we use the test S, we reject the middle five of the nineteen ranges and take the sigiis of 
12-19-9-64, etc. This gives scores 


0 T TS "EE 0 


There is clearly good agreement with an equal probability for zeros and ones; significance 
would be tested in the binomial distribution (4+ 3)". The test S, is not to be recommended 
in the present instance because with only seven comparisons the loss of sensitivity compared 
to the S, test would be considerable. 

Thus although there is a slight indication that the dispersion decreases with time, both 
tests suggest that this could easily be a sampling fluctuation. 


15. SEQUENTIAL TESTS 


Finally, we point out the possibility of constructing sequential tests for trend related to the 
tests considered above. While this paper was in preparation an abstract (Noether, 1954) 
appeared describing briefly a test rather similar to the one we had developed. Hence a full 
discussion will not be attempted here. However, some calculations in a special case suggest 
that theaverage sample size under the null апа alternative hypotheses are, for the sequential 
sign tests, only a little greater than the corresponding parametric fixed sample size. 

Sequential sign tests for trend are only likely to be of practical value under rather 
exceptional circumstances. For they require that observations are sufficiently easy to 
obtain for it to be worth while to use inefficient methods of analysis, and yet sufficiently 
fficult to obtain for the saving from the use of a sequential method to be important. 
A possible application is to the marking of a large number of examination scripts. If they 
are marked in alphabetical order it may be useful to test, as the marking proceeds, for a 
trend in the marks, which would indicate a changing standard of marking. A sequential 
method is appropriate and yet elaborate calculations would be out of place. 


D. В. Cox AND A. J. STUART 95 


16. GENERAL COMMENTS 


The calculation of the efficiency of the above tests and the determination of optimum 
weightings, ete., has been based on a particular type of alternative hypothesis. It is clear 
in a general way that the tests will remain effective for detecting monotone trends. Positive 
serial correlation among the observations would increase the chance of a significant answer 
even in the absence of a trend. 

The occurrence of ties has been ignored in the above work. A small number of ties can be 
dealt with by counting one-half a comparison in each direction, i.e. if у; = у, we calculate 
as if one-half a comparison has у, > y; and one-half has y; < уу. If a substantial proportion 
of the comparisons are ties a special investigation is necessary or the comparisons should 


be randomized. 

Estimates for the trend could be constructed from the test statistics S,, Sg. It is very 
doubtful if such estimates would be of value; in any case, in much work with quick tests, 
if the trend is shown to be significant it can be estimated graphically. 


REFERENCES 


Brown, С. W. & Моор, A. М. (1951). On median tests for linear hypotheses. Proceedings of the Second 
Berkeley Symposium, рр. 159-66. University of California Press. , 

Foster, Е. б. & Stuart, А. (1954). Distribution-free tests in time-series based on the breaking of 
records. J. R. Statist. Soc. Series B (Methodological), 16, 1-22. 

NATIONAL BUREAU OF STANDARDS (1950). Tables of the Binomial Probability Distribution, Applied 
Mathematics Series, no. 6. Washington. 

NOETHER, GOTTFRIED, E. (1954). Abstract of ‘А sequential test of randomness against linear trend’. 
Ann. Math. Statist. 25, 176. 

Norrner, GOTTFRIED E. (1955). ‘The asymptotic relative efficiencies of tests of hypotheses’, Ann. 
Math. Statist. (to be published). 

PATNAIK, Р. B. (1950). The use of mean range as an 
metrika, 37, 78-87. e 

Prrman, E. J. б. (1948). Lecture notes on ‘Non-parametric statistical inference', University of 
North Carolina Institute of Statistics (mimeographed). 

STUART, ALAN (1954). The asymptotic relative efficiencies of distribution-free tests of randomness 
against normal alternatives. J. Amer. Statist. Ass. 49, 147-57. i > 1 

Taxem, Н. (1950). A rank-invariant method of linear and polynomial regression analysis. Indag. 


math. 12, 85-91, 173-7. Я 


estimator of variance in statistical tests. Bio- 


+ 


[ 96 ] 


THE VARIANCE OF THE MAXIMUM OF PARTIAL SUMS OF A 
FINITE NUMBER OF INDEPENDENT NORMAL VARIATES 


By A. A. ANIS 


Cairo University 


1. THE PROBLEM 
Consider n independent standard normal variates X,, Xs, ..., Х„ and their partial sums 
R= AA t.. tA, (r21,2,...,n). 
Let = pie (S) 


denote the maximum of these partial sums. 
Ina paper by Anis & Lloyd (1953) the expectation of U, was studied and it was shown that 


En—1 
EU) = ёт, 

In the present paper the second moment about zero and nen the variance of U, is obtained 
(see equation (7-1)). 

We shall always use the symbol ¢(2) to denote the probability density function of the 
standard normal variate, i.e. 

(x) = (2r) exp (— 42°), 

and Ё (ж), f,,() to denote respectively the distribution function and the probability density 


function of U,: 


Ж) = PAU, <2), f(a) = 2 Eja). 


ifn 
We have BW) = fem [igede (=0 <y <o), (г) 
ae 
where the region of integration K is defined by 
К; «sy (т = 1,2,...,n). 
It may be deduced that F(a) = RE (u—t) dt, (1:2) 
0 


and that f) = Р, 0) фа) [^ 1,40 fe at (1:3) 


2. THREE LEMMAS ON THE F(0) 
At this stage we state three results relating to the Ё(0) which we shall need in the sequel. 
LEMMA 1. ino x (0) = 1. [(2:1) 


This was proved in Anis & Lloyd (1953), and is repeated here merely for completeness. 


А. А. Ахїз 97 
LEMMA 2, F0) = (27)!/22r(r!)*. (2-2) 


To prove this we define a generating function 


М(А) = X АЕО). 


t=0 


Then, using Lemma 1, it is readily seen that 


MYA) = (1-2)3. (2:3) 
Picking out the appropriate coefficient from M(A) = (1— A)? gives the required result. 
LEMMA 3. X rF(0) Е, (0) = їп. 
r=0 
This follows from Lemma 2 on differentiating (2-3) and equating coefficients of A". 


3. THE SECOND MOMENT OF U, AS A LINEAR COMPOUND OF THE F,(0) 


The second moment zta(n) of U, is given by 
њи) = [> 2,34, 
and, using the reduction formula (1-3), this becomes 
pn) = Fat) [^ Геол, 0а. 


The double integral can be integrated once, with respect to v. Using well-known pro- 
perties of (x), and remembering that F,,_; is the integral function of f, ,, we obtain 


ЕЕ | "ау, 10) dt. (3-1) 
We now use the reduction formula (1-3) a second time, resulting in 
ат) ЕЕ (0) | 0 й+ Í 8 | 0-0 faal) dt. 
The last integral may be reduced in the same way. Continuing, we find 


pn) = 1+ ® Fea (3:2) 


where 9, = [о [йди — ya) фу ya) --- Pr- Yr) $Y) n dy; (3:3) 


4. THE COEFFICIENTS f, OF THE LINEAR COMPOUND 


We now seek to evaluate the g,. As a first step we make the transformation 
Ti = у; Yir (i =1, 2,....75 Yeti = 0), 
(4-1) 
ог у= Ў (в = 12,...r) 


Biom. 42 


98 Independent normal variates 


а r 
We then have 9, = Jo f (zx. xm П $(z;) йж, (42) 
R 


where the region of integration is 
Е: Sa,20 (¢=1,2,...,7). 
8 


Непсе g, = L,+H,, 
whee L= Jo Еа) fee, н, MES аа) Hote) der (4:3) 


We now consider these two integrals separately. 


5. EVALUATION OF L, 


The integral L, of (4-3) is readily evaluated, as follows. Since the integrand is spherically 

symmetrical the value of the integral is proportional to the magnitude of the r-dimensional 

solid angle defined by the region of integration Ё; and this in turn is proportional to the 

F,(0), since by (1-1) we have t 

ко) = fin | gender (5-1) 
R 


Let us write F$(0) to denote the integral of the integrand of (0) taken over the whole 
r-space, and let L? be similarly related to L,; then the proportionality of our functions F,(0) 
and L, to the solid angle R enable us to write 


ЕО): FN0) = L: L? 
whence L, = L} F0), (5-2) 
since F%0) = 1. 
The value of Z} is easily obtained by considering the standard integral 
mv f” o exp (- tk E«) П dz, = kt, 
-0 -o 1 1 
If we differentiate this expression with respect to k and then put k = 1 we find 
Тоғ. 
Hence, from (5:2), L, = rF,(0). (5:3) 
The E,(0) are, of course, known explicitly. 


6. EVALUATION or H, 


We now consider the integral H, of (4-3). We first transform back to the original variables 
y; by the transformation (4-1), In this process the cross-product term transforms as follows: 


r-1 r r-i 
Ерхан = [s Ў) = E eer) ers 


ы z (eu. - n -v и), 


A. А. ANIS 99 
provided we interpret y, and y,,, conventionally as 
у= Jp Ун = 0. (6-1) 
Expressing Н, in terms of the y’s, and retaining these conventions, we then have 


їн, = X Куу), (6-2) 
rat 
where K,{r) = INZ NT NER ven) È Yi П (Y= Yea) у. (6:3) 


'The reason for writing K,(r) in this form is that it enables us to perform one of the integra- 
tions at once, since 


[res o у) 953 Ys) 9 (9. — Ys+1) dy, = P(Ys—1) 992; (6-4) 


as is immediately seen on using the explicit form of the functions ¢. 
Equation (6-2) then becomes 


© 8-2 
Kav) = | 6- f TI He = vee) бо) dm te 


х [; (r— 8) p PO bwa TL OY: угаа) Ys dy (6:5) 


Now in this expression the (в — 1)-fold integral is equal to F, ,(0), as may be seen on applying 
the transformation (4-1) to (5-1). 
The other factor in the last expression, an (r— s)-fold integral, we call G,_,. Thus 
© o/k 
в,= [^ w [ (Su) визби, 9 Hua HUD An <A. (60 


We now proceed to show that this is expressible in terms of an integral previously evalu- 
ated by Anis & Lloyd (1953): 
Е, = | “of, Фол) фб — 99) --- BYo-a— Ys) PY) у... dys = (2m) * (s). (6:7) 
; k И: 
We use the identity Уу, = X yr(k4-1—7) Qy,—93—7 Yr) (6:8) 
. 1 


where, by convention, we take Yo = Yura = 0. 
Equation (6:6) becomes 


© © k-1 
в, = Syren) [^ @) Гина UD Пи и) биди done (69) 
r=] 0 
We now use (6-3) to carry out one of the integrations, thus reducing the /-fold integral to 
[г (k— ИА $(1—9) -= Q(y,a— Vrai) Ur) PY p41) P(Yria— Yra) 
0 0 


... Qa Ye) P(r) Wr --- dy, 409,44 -- dU (6-10) 


Ж EP 
(6-2), (6:5), (6:9) and (6-10), 


Gathering the results of this section together, we have from 
r- k = 
H=23S Кут}, Кд) = Ё,(0)@,„ = Xdübrl-DE-E.. Е, = (r) t(r +1). 
8-1 S т=1 
(6-11) 
7-2 


100 Independent normal variates 


7. CONCLUSION 
We have found (3-2) that the second moment of the maximum of the partial sums 
X,,X,+Xo,...,X, +... +X, is (җәл 
из(т) = 1+ У 0.-.-100), 


n—1 
where Gr = rE(0)4-2 У К, (0) 9... 
s= 
; n—1 n-lr-1l 
Thus Jeg(n) =1+ x rE,(0) F,_,-1(0) +2 x УР, (0) F,-1(0) o 
r= т=18= 


Short table of first and second moments about zero, and standard deviation, of the maximum 
of the partial sums of n independent standard normal deviates 


n ГА Hs с 
Б] 0-6810 2-1592 1:3021 
4 0-9114 2۰8842 1:4311 
5 1۰1108 3-6476 1-5536 
6 1۰2893 4:4367 1۰6657 
үз 1-4521 5-2446 1:7709 
8 1:6029 6-0671 1:8702 
9 1-7440 6:9013 1۰9647 

10 1-8769 7-7451 2-0548 

11 2۰0031 8۰5971 2۰1412 

12 2۰1234 9-4560 2:2242 

13 2:2385 10-3210 2-3043 

14 2-3492 11:1914 2-3817 

15 2-4558 12-0665 2-4567 

16 2-5588 12-9459 2۰5295 

17 2-6585 13-8292 2۰6002 

18 2۰7553 14۰7160 2۰6691 

19 2۰8493 15۰6060 2۰7363 

20 2۰9409 16-4989 2-8018 

21 3-0301 17۰3946 2۰8659 

22 3۰1171 18-2928 2-9285 

23 3۰2022 19۰1934 2۰9899 

24 3۰2854 20۰0962 3-0500 

25 3۰3668 21۰0010 3۰1090 


The second term on the right-hand side equals }(n—1), by Lemma 3, and the third term 
-2 
reduces, with the aid of Lemma 1, to X G,. So we finally obtain 
1 


n—-2 
изт) = (n +1) +2 X б 


UNE ые : 
where, by (6-11), 8, = z L2 {j(r-—j+ 19 


А. А. Ахїз 101 
Finally, then pin) = (n+ 1)+ 5 = Y {s(r-s+ 1)}. (7-1) 
r=ls=1 


From the point of view of numerical evaluation it is fortunate that the individual terms 
of the summand in (7-1) are independent of n; computations carried out for a given value 
of n can be immediately utilized for larger values of n. A short table of specimen values is 


appended. 
Values corresponding to very high » may be approximated by a result of Erdós & Кас 


(1946), who gave the limiting distribution of 0, = n= U, as 


lim Pr {6,, <a} = “exp (— 44°) dz (a>0) 
7 jo 


no 


=0 (<0). 


The limiting second moment of 0, is thus unity, and the asymptotic second moment of 
U, is n. 
Our results are in agreement with this. We may approximate the double sum in (7:1) 


by the double integral — 
f gy - 1—2)- dedy. 
y-iJz-i 
If we reverse the order of integration, this may easily be evaluated explicitly to give an 
expression which, to terms of order nt, reduces to 


nz — ?(2 + 42) nt. 
24.42 
7 


Thus Ji — 


An. 
For n = 25 this gives и ~ 19-6; the correct value is Jis = 21-001. 


The author wishes to acknowledge his debt to the referee for the asymptotic results 
given in the last paragraph. 


REFERENCES 


Anis, A. А. & Lrovp, E. Н. (1953). On the range of partial sums of а finite number of independent 
normal variates. Biometrika, 40, 35. — 
Enpós, P. & КАС, M. (1946). On certain limit theorems 


Math. Soc. 52, 292. 


of the theory of probability. Bull. Amer. 


[ 102 ] 


SPATIAL POINT PROCESSES, WITH APPLICATIONS TO ECOLOGY 


Bv H. R. THOMPSON 
Applied Mathematics Laboratory, D.S.I.R., Wellington, New Zealand 


1. INTRODUCTION 


The term “point processes’, referring to stochastic processes in which events occur at more 
or less irregular intervals and which are represented by points on the time-axis, is of com- 
paratively recent origin, although the existence of such processes has in fact been well 
known for a long time. They have been discussed fairly extensively in such diverse applica- 
tions as the counting of radioactive impulses, telephone calls and cases of contagious 
diseases. Wold (1949) developed a statistical theory for treating processes of this type, and 
also mentioned briefly how the events could take place in a two-dimensional or higher field. 
Such a generalization, from events with no time extension to those with no ‘space’ exten- 
sion (i.e. specifically of a point character), has a suitable field of application in the ecological 
study of the distributional pattern of plants. If we can assume to a first approximation that 
the plants have the dimensions of a point, then we shall see that it is possible to discuss 
precisely probability relationships between the numbers of plants in different areas of the 
region under investigation. 

The main aims of quantitative ecology are the precise description of a community of 
plants with interpretations in terms of the biology of the species, and the correlation of 
vegetational and environmental data, and ecologists have used several methods in an 
attempt to achieve these aims. In most of the initial work on field sampling for ecological 
data, the procedure was to take ‘quadrats’ (sample areas small in relation to the total 
area of the region) scattered at random over the area, and study statistics derived from the 
frequency distribution of the numbers of plants per quadrat. While this approach is useful 
to some extent, in that any given type of distribution function may be fitted to the data, 
it does not necessarily furnish the kind of information required by an ecologist. It will 
not give any evidence of trends, or indicate the pattern of the distribution over the area 
or the way in which this pattern may have arisen, all factors of prime importance in the 
study of the structure of a plant community. We only have to cite the negative binomial 
distribution, which is known to arise in at least four different ways, all based on widely 
differing assumptions, to illustrate this point. | 

In recent years ecologists have become aware of theneed for a more satisfactory approach 
to the problem, and Greig-Smith (1952) provided a potentially great advance on the statis- 
tical side when he recommended the use of a grid of contiguous quadrats over some portion 
or portions of the region. The advantage, of course, in arranging the quadrats in a grid is 
that the analysis of variance technique may be employed, either for the detection of trends, 
or, more importantly, for the detection of a mosaic variation in density (due to ecological 
causes connected with the spread of the plants) by a ‘nested sampling’ type of analysis of 
variance, associating the quadrats into successively larger blocks and comparing the 
component block variances. The details and applications of this method are described at 
length by Greig-Smith, together with the results from sampling experiments on artificial 


| 
| 


H. R. THOMPSON 103 


plant communities. We are not concerned here with discussing the ecological implications 
of the method (for which the reader is referred to a paper to appear elsewhere (Thompson, 
1955)), but with the application of point process techniques to deriving the probability 
relations between the numbers of plants in the quadrats of a grid, required particularly 
in the analysis of variance and its sampling theory. 

Wold's treatment of one-dimensional point processes is based on the distribution of the 
time interval between successive events, which could be either independent, or dependent 
on the previous interval, in which case the occurrence of an event would depend on the times 
of occurrence of the two preceding events. Given A(x, y) dt, equal to the joint probability 
that an event has occurred in (t,t + di) with the two immediately preceding events at in- 
stants t,, ta, where ж = t— t y = t17 to Wold derives integral equations for the distribution 
functions F(x,y), which is the conditional probability that an event at t is followed by an 
interval (t, t +2), given that the immediately preceding event occurred at t, — y, and G(x), 
the absolute probability that an event at f, is followed by an event in (t, t +2). From these 
he obtains the probability S(v, Т) of exactly > events in an arbitrary interval of length 7’. 
The definition of A expresses the fact that the process is stationary, for A remains the same 
if the set of events undergoes a translation along the time-axis. 

This type of approach is adequate for one-dimensional fields because, by means of it, a 
process in which the course of events depends on all the prehistory up to a given time can 
be completely specified (apart from random variation); but it does not easily extend to 
more than one dimension. It seems in fact impossible to specify a stochastic process in 
two dimensions by means of a similar dependence of events. The difficulty is now (at least 
in the ecological application) that we are studying a function F(z, y. T) say of the two 
space variables v, y and the time variable t, for a given value Т of t, and we are given, 
specified by means of their space co-ordinates, a set of events which may have occurred 
separately at any time. Although the development of the process occurs along а time-axis, 
with events (i.e. new plants) occurring at specified times, it is mainly the consideration of 
the spatial pattern with which we are concerned. 

For the purpose of finding probability relations between the numbers of plants in neigh- 
bouring finite areas, it is convenient to use the method of ‘continuous parameter’ stochastic 
processes, developed in connexion with physical problems such as electron cascades, the 
main contribution being jointly by Bhabha (1950) and Ramakrishnan (1950). They con- 
sidered the case of particles with specific energies in a continuous range, Le. effectively 
point processes in one dimension. Probability relations between the numbers of particles 
in non-overlapping infinitesimal ranges are defined by density functions of different orders, 
called by Ramakrishnan ' product densities’, and the integrals of these over finite ranges 
yield linear functions of the product moments of the total numbers of particles in thefinite 
ranges. The required mathematical treatment is obtained by considering à function of the 
continuous parameter Ё, n(E) say, which equals 0 everywhere except at а finite number 
of points E, Ep ..., Ё вау (at which events occur), where n(E) = 1. With suitable modi- 
fications this is the approach adopted in this paper, and in the next section the theory of 
spatial point processes is recapitulated in the most convenient manner. As in Wold, we 
assume that the processes are stationary in the statistical sense, the essential character 
being unaltered by any translations of the axes. This is a quite reasonable assumption for 


plant communities, where trends can often be assumed to be absent. 


104 Spatial point processes, with applications to ecology 


2. SPECIFICATION OF POINT STOCHASTIC PROCESSES IN TWO DIMENSIONS 


The distribution of plants over a region is described by a continuous parametric system, 
whose states are labelled by the two continuous variables x and y, the co-ordinates of 
position of a plant with respect to a given origin. Thus a given pair of values (v, 4), which 
will alternatively be denoted by the parameter А, regarded as a vector, is taken to imply 
the occurrence of a plant in the infinitesimal rectangle with lower left-hand corner (x,y) 
and sides of length dz, dy. The infinitesimal element of area dx d will likewise be denoted 
by the vector @А. A realization of such a system will consist of a number of isolated points, 
k say, denoted by the parameter values A,, Ag, ..., Ap. All states of the system are covered 
by allowing each А, to range over the whole domain of A, and letting k = 0, 1, 2, ..., to give 
all possible samples. With each possible sample is associated a probability П(А,, Ag, ..., Ax) 
and a rigorous treatment of continuous parametric systems is obtained (see Bhabha, 1950) 
by a precise mathematical definition of the probability II, using set theory. For the purposes 
of this paper a more direct approach, as given by Ramakrishnan (1950), is sufficient. 

Let N(A) denote the number of plants whose positions are below and to the left of the 
value А = (x, y), i.e. whose parametric values are less than A. Then we can consistently write 


Na) =| аа), 


and regard dN (A) as denoting the number of plants in the infinitesimal rectangle, area dA, 
situated at А. Assume that the probability of one plant in @А is md.A, and the probab: lity 
of more than one is o(d A). Hence dN (A) is a variable assumed to take only the values 6 or 1, 
andit follows that all moments of dN (A) are equal to the probability that it takes the va!ue 1. 
Also the consideration of probability relations between different dN (A) is very much sipli- 
fied, for a contribution to the product moment E(dN(A,) ...dN(A;,,)} will only occur when 
there is à plant in each dA,, and the contribution is then simply unity. If any one of the 
dA; contains no plant, the contribution is zero. 


The probability relations between the numbers dN (A) in the small areas dA are defined 
by functions of the form 


fi Ass Ag, ..., 45) dA,dA, ... dA, = E{N(A,)dN(A,)...dN(A,)}, (2-1) 


which represents the joint probability that there is опе plant in dA,, one in dAg, ..., one in 
dAr, when dA,,dAyg,...,dA, are all separate non-overlapping areas. f; is called a product 
density of degree k. f,(A)dA = E{dN(A)} = mdA, and it may be noted that the integral 
of f, yields only the mean number of plants in the area of integration, as the addition rule 
of probabilities does not apply, the events in general not being mutually exclusive. When 
two of the dA; are the same, a product density of degree k becomes one of degree k—1. 
For if dA, =dA,, 


—1› 
B{GN(Ay)dN(A,)...dN(Ay_,)*} = E{dN(Ay)dN(Aq) ... AN (А,)} 
= fiAn Apn Apa dA MA, ... dA, ,. 


It is degeneracies of this type that lead to the following theorem, given by Bhabha, for the 
product, moment of the numbers N(A/) — N(4,), denoted by N in finite areas [4,, А] 
(i = 1,2,..., k). A; is the lower left-hand point, A; the upper right-hand point of the area, 


Н. В. THOMPSON 105 


which is assumed to be rectangular. The Ё areas denoted by the numbers [1], [2], ..., [k], 
and any number of them may be the same area: 


+2 | ! | fl Ay, Ag, А,)аА,4А„4А, 
ip ..., rd {г++1,‚...,81„/ 8+1, ... K] 


+f | а „лал, „блу (22) 
а/ш Jw 
where 

(1) by [I,...,7] is meant the equality of the r areas [1], ..., [7], if in fact they are equal. 
If they are not, the terms containing [1,..., r] do not appear in the expression; 

(2) the summation sign before each term denotes a summation over like terms in which 
the integers 1, ..., are distributed in all possible ways between the brackets affixed to the 
integrals, there being no distinction between the order of the integers in a bracket or between 
the order of the brackets. For example, with Ё = 4, integrals of the type of the second term 
are summed with the groupings of the integers given by 


[1]1[234], [2][341], [3][412], [40123] DABI, [13] [24] [141023], 
of the third term by 
Wea (200031024), O23) 127031024), DID) [3] [4] [12]. 


Thus, if the four areas are the same area [1] say, we have for the expectation of the fourth 
power of the number in [1] 


moa enl el ull 


using a condensed notation. Tf there is more than one area to consider, the expression is 
modified; for example, 


sooo = fe „ЬУ nd lu 


The product densities may consistently be regarded as factorial-moment densities (cf. 
Bartlett, 1954). For, if we consider the expression for (1), the coefficient of the term 
containing the product density of degree s is the sum of the coefficients of all terms 


у... ZAN(A,) dN(Ag)^ ... AN (A, 
8 
for a given s and all different combinations of a, 0, ..., % subject to the condition à" = ғ. 


That is, the coefficient of ! al f, in B{Nj is Er! (210! ...0,!) the summation being 
[1] w 


over all possible different combinations of the sæ;. Stevens (1937), in a different connexion, 
proves that this coefficient is of the general form As0"/s! = 0% say, where A30" (8 = 1,2,...,7) 
is the sth leading difference of the rth powers of the natural numbers. Thus we have 


ЕМ) = of ada +05 | MECA dA,dA, 
+01], 2j [Ap o AA, dA. 
ш ш 


106 Spatial point processes, with applications to ecology 


One of Stevens’s results is, in our notation, 
Nin = Ci Nat CENa(N 1) +... 7 Cr Na (Ng 1)... (Ny 74-1). 


А simple inductive argument then shows that 


| d FA 44) dA, dA = E(Na (Ng — 1)... (Ng— E41). 
11] ul 


For a process to be stationary we require that the covariance of the numbers in two areas 
is independent of their absolute position. Thus, for two areas [1], [2], 


= ЕМ Na) — ЕМ) ELM} 
= f fti ao mana, from (22). 


This implies that folA,, 45) - mà = 0(A,—A,), 


so that f,(A,, A.) is a function of the differences 05—21, Ja — Yı 

In practical applications the calculation of the product density of degree k will generally 
involve a tabulation of the different ways of getting a plant in each of the k infinitesimal 
areas, due usually to the presence of different classes of plants (different groups or families 
of plants, different generations, etc.). Л is then obtained as the sum of the (mutually 


exclusive) probabilities for all these possible alternative cases, according to the addition 
rule of probabilities. 


3. ANALYSIS OF VARIANCE ON A GRID OF QUADRATS 


The individual terms of the analysis of variance being considered in this paper are the sums 
of squares within blocks of a given size and between blocks of the next lowest size. The most 
useful type of grid to employ is one in which the number of quadrats is a power of 2, leading 
to blocks of 2,4,8,... quadrats and consequently more information, relative to the size of 
the grid, than any other arrangement of blocks. The total size of the grid is quite arbitrary; 
for practical and theoretical purposes it has been fixed here at 256 quadrats, arranged in a 
16 x 16 square. With a suitable choice of quadrat size, the important ecological effects 
should be found in the terms for the smaller-sized blocks, which have a fairly large number 
of degrees of freedom and are consequently subject to less fluctuation. 

From the definition above, we see that if Big denotes the total number of plants in the 


ith block of k quadrats, then Sk, the sum of squares within blocks of k quadrats and between 
' blocks of 44 quadrats, is given by 


9512/6 1 256/6 
S. = 7 2 Віма р 2 Вік, (3-1) 


from which S, has n, = 256/k degrees of freedom. This is ordinarily the most convenient 
practical way of calculating individual terms, but for deriving an expected analysis of 
variance and its sampling errors for any given theoretical model we need to work in terms 


cov (Ng Ng) = ВА — E (No) Na- EUN) 


Eee eee 


Н. В. THOMPSON 107 


of the numbers їп a single quadrat. Consider now the moments of S, fora given k (k = 2,4,8, 
..., 256). A three-suffix notation for the quadrat numbers is the most convenient for this 


purpose. 

Let N;j be the observed number of plants in quadrat (ijl) of the grid, where i (= 1, з; 
256/k) denotes a primary block of k quadrats, j (= 1,2) a subblock of k quadrats, and 
l (= 1,2,..., M) a quadrat of a subblock. We assume that blocks of the same size always 
have identical shapes. For k = 2?»** (p = 0,1,2,3) blocks are taken to be squares of sides 
27*1 quadrats, while for k = 2?»*! they are assumed rectangular, of dimensions 2? x 2Р+1 
quadrats, with the longer length always in the same direction. 

From (3:1), transforming B, to a sum of the Nig, we obtain 


S, = ТЕПЕ Nad а) KE E NUI 
= FDIS Wu- Na). (82) 


From (3:2), by long and tedious but otherwise simple algebra, the higher powers of S, may 
be obtained. We quote only Sf in full: 


St = ГЕ (Б War- Nat X (Ou - Na (2 Ui New 


= > z D Nitt D №№ 450 
+3 Мами 3 ХММ 
8 6 ae Nar + ds Ni Nor 
nt 12 E Nin Ne Nar б „Хи Nor Nor Nir 


„ММ Мид — 4 Nea Ni Nip N; 
+3 жу. УЖЕ ЛЕ АЛ NM iar Nor Мат} 


1 
= Na N? + 2 X, Nig Noye Ne 
+p, EQ i Net? Y Ми Мут т 


= №2 „М Noye + 1,1,3. Ne 
42. ia Neg Мут "m iae Nese Were 
„и Nopp — 8 Noa Nige Мул Ner) (3:3) 
OUT. „Мит Ма Ner о плут Vern 22 


LU, 


where i, i^; j,j'; LU, U^, U" are varied independently over their whole respective ranges unless 


otherwise stated. 
For the mean and variance of Sj, 
in the first case 


B(S) = 228 Eid +22 EN Nue) =2 5 Ва ат) 


we take expected values of (3:2) and (3:3), and see that 
(3:4) 


i ; inati i Ng}, while {Sf} will be a linear 
and is only a linear combination of terms like 21%), EUN Ма k 
combination of terms like (ЛА), ЕО tat ЕОМ МЫ) ЕОМ Nia Мы}, E(Ng Ng Ха Na}: 


108 Spatial point processes, with applications to ecology 
The expressions for these expectations, obtained from (2-2), are given below, with 
ЛА, --., Ay) 4A, ... dA, written f, for short: 


EN =| nef [rs Mo =| [fo 


во - aa МАЈ 
къы «Jf eof f fef ГЫ 
soia. [ef fe ff added a! 
ыызы = Jao А 


ENa NaNaN „КЕ 
{Ми Мы Ma Na} me NL 


(3:5) 


4. APPLICATION OF THE THEORY TO AN ECOLOGICAL MODEL 


The simplest assumption mathematically that can be made about a community of plants 
is that they are distributed at random in the Poisson distribution. Such а distribution might 
conceivably arise when an area is first invaded by wind-borne seeds, but it is hardly ever 
encountered in practice. A more realistic and more useful model is obtained by allowing 
plants distributed in this manner, i.e. at random, to become the centres of distribution 
(‘parents’) of a generation of ‘offspring’ plants, whose positions depend on those of their 
respective parents. The offspring of a given parent are assumed to be distributed independ- 
ently of each other, the distances r from parent to offspring following the isotropie normal 


distribution flr) dr = e-tlo* dr|(2z0?), (4-1) 


and the numbers n of offspring per parent following an arbitrary probability law p(n). 
We consider for simplicity the distribution of the offspring only, for then the only distances 
which have to be taken into account are those between offspring of the same group, and 
these are also isotropic normal, but with parameter /2 с. For, f(r) dr is the product of two 
independent components e~%/* dal (270), еті! dy) (2592) along rectangular axes, and 
the distance between two offspring projected on either of these axes is merely the sum of 
two independent distances (from the parent) following a one-dimensional normal 
distribution, 

The product density Л of degree k is found ($2) as the sum of mutually exclusive pro- 
babilities, denoted by Pr(dAj,...,dA,), for all the possible alternatives with Б different 
areas dA,,...,dA,. Here the contributions to f, will arise from all different combinations 
of k plants from s groups (s< k) such that there are æ, plants from the first group, ..., 2, 


8 
from the sth with 2 од = k. Of these we need only consider the case s = 1, i.e. all k plants 
=1 


from the same group; 


the other cases will merely result in products of lower order product 
densities since the с 


ontributions from different groups are entirely independent. We 


| 


| 


Н. В. THOMPSON 109 


prove now that the joint probability of Ё offspring of the same group occurring in 
dA,,...,dA, is 
Pr (dA,,...,dA,) = m Efnin- 1)... (n—k + 1)} P(A, .... Ay) dA, ...dA,, (4-2) 


h ГАР 9 E. 
where Hayy Ae) = (з) БРЕ (43) 


Ri, = (®—,)#+ (y; — Ys)", 


and m, is the mean density of the parent plants. The mean density of offspring plants is 
m = m Efn}. 
Assume that the position of the parent plant of a group with a random number n of 
offspring is at dAy. The probability of this is 
mydAy. 


The probability of k ( < n) offspring of this parent in dA), ...,dA;, distant Ay, ..., Ro, from 
d.A,, given the parents’ position is at dAg, is 


1 үх к Ri, 
мп) exp| - X [dds dy 


since there are n possibilities for dA,, n— 1 for dA, ...,n—k-- 1 for аА}, and each distance 
is distributed independently. Therefore, averaging over n, 


Pr (GA, dA, ...,dA,) 
= Pr(dA,)Pr(dAy, ..., dA, | AS) 


k E [лш € 
= mE(n(n— 1) ... (n — k-- 1} (za) ex|[- X (cer dA, dA, ... d Ay 


—gy E 
= mgE(n(n— 1)... (n — k^ улер | 6-9] dA, 


p E s (а) + (Jeg) 
× (s) exp|- SS it 1 dA, ...d As, 


i=1 j>i 
ae д NM EE. 
where x => p => T 


Integrating out with respect to dAq( = dy dy) over the whole region (assumed infinite), we 
obtain (42). ф(А\,..., А„) may be shown to be a product of two (k~ 1)-variate normal 
distributions (in z and у), the Ё— 1 variables being, for example, the differences of the first 
k—1 values of x (or y) from the kth, во that each variable is distributed normally, mean 
zero, variance 20°, and the correlation between any two variables is $. 

For the mean values and variances of the set of sums of squares we require the product 
densities of second, third and fourth degree. With two areas dA, and dAg, there are only two 
possibilities to consider; both areas might contain plants from different groups, in which 
case the joint probability Pr (dA, dA») is m*dA,d.As, or else the plants may be offspring 


from the same group. We note that the case where one of the @А contains no plant is 


110 Spatial point processes, with applications to ecology 
automatically excluded, since it makes no contribution to the second degree product 
density. For the second case, we have from (4-2) 


Pr (d4,,d4,) = myE(n(n—1)) $(Ay, Ay) 44,4A,, 
so that, adding the two mutually exclusive probabilities, we obtain 
ЛАА) dA, dA, = [m E(n(n — 1)} ф(А,, Ap) + m*] dA, dA, 


a (e emen dA, dA,, (44) 


from (4:3) where 9, = E{n(n—1)... (n—r + 1) E(n). (4:5) 


Now tabulate the possibilities for three plants in three different areas dA,, dA,, 143. 
If the plants are all in different groups, 


Pr (dA,,dA,,dA 3) = m3dA,dA,dA,. 
If dA,, dA, are in a group G, dA, not in G, then since the contributions from different groups 
are independent, we have 
Pr (dA, dA, dA) = тЕ(п(п – 1): 9(A,, Ap) dA, dA, . md. As. 
If dA,, dA,, dA; are all in the same group, 
Pr (dA, dA, dA) = m,E(n(n— 1) (%— 2)}ф(А,, As, As) dA dA d Ag. 
This exhausts all the possibilities; therefore, 
МА. А», As) = т? + mga {P(A A5) + (Aj, 45) + (As, 43)) + mgs 9(A;, Ay, А3) 
= M? +mM?ga(e Fito? 4. e—Ris/4o* е-ро?) Ато? 
+ gg e (Flat Rist Rigl6r* |] 2594. (4:6) 
We omit the tabulation for the fourth degree case, as it is obvious from the expression 
SiAn An Ag, 44) = m4 + mgs (A1, 45) + Ф(А,, 45) + (Ay, 44) + ф(А„, Аз) 
+ (A, Ay) + (Aa 44) 
+mg3{p(A, А») 9(As Ay) + $(Aj, As) (А, Ay) 
+$(A,, 44) #(А„ 43) 
+mgs{b(Ay, As, As) + P(A, Ag, Ay) + (Ay, Ag, 44) + (Ao, Ag, 44)) 
+ Mg pldi Ag, Ag, Ay). (47) 
The product densities are integrated and substituted in (3-5) to give the expectations 
needed in the calculation of E(S,), E(S23. An explicit solution has been found for the in- 


tegral of f; only, but product densities of all degrees may be integrated numerically. Let 
the quadrats of the grid be squares of side h, and write 


ТЈ» ... Ay) A1... dA, = №, (4:8) 


| 


-—————————»a 
—«R n و‎  ————A— ШЕ 


Н. В. THOMPSON 111 


where square brackets round each suffix of J may be assumed if desired. Th i 
€ = y umed i . Then equations 


E(Nà = mh + mhg lı, ЕМ Na) = magi hs 
E(N&) = mh? + mht + бт? + mhê + mh*(7 + 18mh? + 6m*h4) ga 1 
+ 3121595 I, + mA? (6 + 4mh?) gs 15; + mh ghar 
EN Na) = mht + 3m*h5 + mhê + 3m*hA(V + mA?) gaJ, + таз ha 
4-тАЗ(1 + 6mh? + 3m2h5) ga I, + 324393 hı ha 
+ 3тАА(1 + mh?) дз 1,1, + тА Аууз, 
ЕҚ? Nis) = mht + 2m?h* + mhê + 2m*h*(1 + mh?) gs I, 
+mh?(1 + 4mh? + Ат?) ga L,, + m*hgs + 21» (4:9) 
+ mh2(1 + 2т®%?)дз(А» + Das) + Mh? gadi 
E{N2 Na Ng) = тй + m*h? + тһ, Ty, + (m3h5 + 2mPh®) Jallo + Ls) 
+ (mht + m3h9) gy Tog + т1%ҺА0 (11+ 21.13) 
+ (mh? + 2m3h) ga Is + тда (1з + 113) + т), Tis 
E(Na Ng Ng Na) = mth + m*h*9, (15, + 18+ Ha + Ls ay, + Iss) 
+ m?hg( La la, + hs Joa + Labs) 
+ mhtgg( los + Hes + Liza + Loss) + Gs asa: 


The details of the calculations are omitted, as much tedious manipulation is involved. 


Table 1. Expected mean squares for the non-random model 


Ny 
E(S,[mh?) 


cal examples possible, since there are two sets of para- 
those of f(r). A study of the effects of different func- 
the number of plants in a single 
Thompson, 1954). Table 1 gives the expected mean 


There are many different numeri 
meters to be varied, those of p(n) and 
tions p(n) and f(r) on the form of the distribution of Ma, 


quadrat, has already been made ( 
squares E{S/mh*}, (Sy, = 8/24), for the particular case с = A] 42 (which gives a simplifica- 
tion of the numerical integrations) anda binomial distribution for p(n) with a mean of three 


offspring per group. The parameters in p(n) = (2) чар" (n = 0,1,...,N) are 


N = 6,р =}, апд, = (N -1)(N-2)... (N-r+1)p"71. The set of expected mean squares 
E(S;) has upper and lower limits, reached as т tends to 0 and со respectively; they are 


112 Spatial point processes, with applications to ecology 

mh*(1--g,) and mh?, independent of k. These are both mean squares for a Poisson dis- 
tribution, and are to be expected, since in the first case each offspring's position coincides 
with that of its parent, and in the second case there is effectively no clumping. 

The standard error of a sum of squares is not in itself a very useful statistic, because of 
the skew distribution of Sẹ, at least for a small number of degrees of freedom. For the 
first three mean squares of Table 1, with fairly large n; (and for which normal approximations 
might be expected to hold), the standard errors are, respectively, 0-20, 0:27, 0-46 for mA? = 1, 
and 0-13, 0-16, 0-26 for mh? = 3. A more useful practical method of studying the sampling 
variation is to determine the limits of error of individual mean squares, so that the set of 
expected mean squares for a given model has appropriate to it ‘significance bands’ outside 
which terms observed from samples of the model would only be expected to fall with a given 
(small) probability. In this approach we ignore the fact that a real correlation exists between 
any pair of mean squares, except for the case when the Nare independent normal variables, 
and calculate limits of error on the assumption of approximate x? distributions for individual 
mean squares, based on their first two moments, since we know that for №, normal (variance 
с? say) S/o? is distributed exactly as x? with n, degrees of freedom. Table 2 gives these 
limits of error for the case mA? — 1 only, but to a first approximation they may be taken to 
apply to mean squares S;/(mA?) so long as mA? is not too different from unity. The bands are 
symmetrical, in the sense that'the probability of an observed mean square being above the 
upper limit equals the probability of its falling below the lower limit. 


Table 2. Approximate p % significance bands for the mean 
squares of Table 1 (mh? = 1) 


k 2 4 8 16 32 64 128 256 

95 95 band 0-76 0-72 0-86 0-72 0-62 0-28 0:06 0-00 
1:55 1-78 2-65 3:50 5.38 1-28 10-88 15-22 

80 95 band 0-86 0-86 1-07 0-99 0-96 0-62 0:26 0:03 
1:39 1:55 2:25 2-78 4-08 5:08 6:72 8-18 


It is instructive to compare these figures (applying to a model with not a very high degree 
of contagion) with the results for a purely random distribution (parents alone, say), which 
шау be shown to approximate very closely to the normal case. The correlations between 
individual mean squares are in this random case almost negligible, and S,/mh? follows almost 
exactly a x? distribution with n,{2kmh?/(1 + 2kmh*)} degrees of freedom. For a Poisson dis- 
tribution, we have explicitly E(Sj) = mh 


var(S;) = mA*(1 + 2kmh?)/256. (4:10) 


Table 3 gives the limits of error for the case mh? = 1, but as in Table 2 they may equally 
well apply for other values of mh?. In fact, for mh = }, со, which are symmetrically placed 


with respect to mA? = 1 here, the differences from the values given in the tables are never 
more than + 0-02, 


Н. R. THOMPSON 113 


Table 3. Approximate p% significance bands for the mean squares 
from a Poisson distribution (mh? = 1) 


" | | | М 
k N К | 8 16 | 32 | 64 128 256 
| =! | | | 
| | 
95 % band 0-75 0-67 0-56 042 | 027 | 012 | 0-03 0-00 
1-29 | 140 | 156 1-82 220 | 279 | 370 5-03 
3 - | i | | | 
| | | | 
80 % band 0-83 0-77 0-69 0-58 0-43 0-26 0-10 0-02 
118 | 1-25 1-34 1-48 1-68 1-95 2-30 2-71 
| | 
5. Discussion 


When an analysis of variance is carried out on a set of observational data, the usual aim is 
to test the homogeneity of the set of variances obtained, employing the F test. With a 
grid of quadrats, the appropriate method would be to test the ratio (Sp/ny)/(Sa/na), which, 
on the null hypothesis that the numbers Myin the quadrats are independently and normally 
distributed with the same variance, follows the F distribution with n, and ny degrees of 
freedom. The conditions for the applicability of this test-will not, however, generally hold 
in the present ecological application, for the associations of plants in clumps result in the 
non-independence of the numbers in neighbouring quadrats, and even more distant quad- 
rats if the clumps themselves are related, and the effect of clumping is usually to produce 
a contagious distribution of the quadrat numbers. Under the usual null hypothesis, Hp say, 
we have Pr {F > F, | Ho} = о, (51) 


where F = (S;/7,)/(Se/2), and F, is the value of F on n, and n, degrees of freedom at the 
100% % level of significance. The expected value of Ё on this hypothesis is unity; however, 
for most non-random models, including the example above, it is greater. (It may occasion- 
ally be less than unity.) Therefore, to study the power function of the F test we should 
ideally calculate the probability 

Pr{F/E(F)> F, | Hj = f, (5:2) 
where H is the non-normal hypothesis relating to the particular model, and compare 
В with a. 

This has been done approximately for the example of $4, for the two cases k = 4,8 and 
with æ = 0-05, 0:01, the results being given in Table 4. The method used is one recommended 
by David & Johnson (1951). Rewrite (5-2) in the form 

Pr(S,-a8,2 0| Н} = Ё. (5:8) 
where a = n, F, E(F)|n, = 2F, E(F)|k. The first two moments of 8 = S,,—aS, are sufficient 
to find this probability approximately, transforming 8 so that probability levels for the 
X? distribution can be used. А new statistic S/(mb*) + C is formed whose first two moments 
are exactly those of a x” distribution with f degrees of freedom by taking 


С = fvar{S|(mh*)}— B{S|(mh2)}, f = bvar {Sf (mh). 
Then Pr (35,2 O} furnishes В. The method is highly satisfactory when the A; are Poisson 


or normal variables. These results show that the actual level of significance being used is 


8 Biom. 42 


114 Spatial point processes, with applications to ecology 
Table 4. Values of В (from (5-3) for the model of $4 


а = 0:05 а = 0:01 


o0 H 1 со 
0-055 0-033 0-023 0:013 
0-089 0:048 0-038 0-028 


quite close to its assumed true value. However, we have assumed on a priori knowledge of 
E{F}, which would not normally be so in practice. A truer picture of what would happen if 
the F test were applied indiscriminately is given in Table 5, where we have calculated 


Pr {F >F, | B} = f. (5-4) 


In this particular example, we see that the use of the F test on unadjusted variances is of 
no value at all for k greater than 4, and even for k = 4 the differences between £’ and æ 
are becoming appreciable. 


Table 5. Table of В' (from (5:4)), for the model of § 4 


Рт T 
& — 0-05 a = 0-01 
i 1 со 
0-054 0-040 0:027 
0:350 0:342 0-334 


We have only discussed one particular example in this paper. Several others giving rise 
to non-random distributions of plants have been derived, consistent with the mathematical 
and computational difficulty involved. They are described at length in Thompson (1955), 
with more emphasis on the ecological application, however, so that it may be of interest to 
note some of them briefly here. We can extend the model of § 4 to include several generations 
of plants, the distances from each offspring to its parent still following the isotropic normal 
distribution (4-1). The offspring of the original parent become the parents of another genera- 
tion of offspring, which in their turn become parent plants, and so on. The distance between 
any two plants of the same group also follows (4-1), but with parameter depending on the 
number of direct steps between the two plants. If the original parents are randomly dis- 
tributed, we find similarly to (4-4), 


Sa(A,,A3)—m? = ma En.) etat omaat, (5-5) 


where n, is the number of pairs of plants in the group which have the distances between 
them distributed with parameter a c. 


Н. В. THOMPSON 115 


Models using the isotropic normal distribution for f(r) produce over-dispersion of in- 
dividuals (ratio of variance to mean in a single quadrat greater than unity). If the x and y 
components of the distance between successive plants in a chain follow independent x* 
distributions with 2f degrees of freedom (f > 2), then under-dispersion of individuals results 
because of the ‘negative correlation’ effect introduced (cf. Bartlett, 1954). The frequency 
function of the distance between plants g generations apart is easily obtained, being simply 
the product of two independent д? distributions with 2/g degrees of freedom. For a group 
with n plants (the nth being the offspring of the (n — 1)th, and so on), the original parent 
being randomly distributed and with f — 2, we have 


n-i 
fal Ay, 49) =m? = ДР? 2(n — д) (A, A9)? [(®— 21) (ya — 95] eh) Mn) /[(2g — 1)!]* 
(5-6) 
where A,, A, are arbitrary and are considered in relation to the quadrat size A, and x, > 2; > 0, 
у» > уу 2 0. This represents approximately an actual field example, in which a definite 
forward move is made in each generation due to vegetative spreading of the plants, and the 
probability of having two successive plants very close together is small. The other models 
discussed are developments of these two basic types and have no intrinsic mathematical 
interest. In all of them, however, an attempt has been made to keep the ecological aspect 
in mind so that they should be descriptive in some way of an idealized plant community. 


I wish to acknowledge gratefully my indebtedness to Prof. M. 8. Bartlett for suggesting 
this subject of research, and for his advice and assistance during my investigations at 
Manchester University. I also wish to thank the New Zealand Department of Scientific 
and Industrial Research for financial assistance at that time. 


REFERENCES 


BARTLETT, M. S. (1954). Processus stochastiques ponetuels. Ann. Inst. H. Poincaré, 14, 35. 

BHABHA, Н. J. (1950). On the stochastic theory of continuous parametrie systems and its application 
to electron cascades. Proc. Roy. Soc. A, 202, 301. 

Davin, Е. & JOHNSON, N. L. (1951). The effect of non-normality on power function of the Р test in 
the analysis of variance. Biometrika, 38, 43. 

GnaErG-SwrrH, P. (1952). The use of random and contiguous q 
plant communities. Ann. Bot., N.S., 16, 293. ur ү A 

RAMAKRISHNAN, А. (1950). Stochastic processes relating to particles distributed in a continuous 
infinity of states. Proc. Camb. Phil. Soc. 46, 595. 

Srevens, W. L. (1937). Significance of grouping. Ann. Eugen., Lond., 8, 57. 


Tuomeson, Н. R. (1954). A note on contagious distributions. Biometrika, 41, 268. 


Тномрѕом, Н. К. (1955). Statistical study of plant distribution patterns using a grid of contiguous 


quadrats (in the Press). $ М 
Wo xp, Н. (1949). Sur les processus stationnaires ponctuels, Le calcul 


tions. Publications du O.N.R.S., 13. 


uadrats in the study of the structure of 


des probabilités et des applica- 


[ 116] 


THE OUTCOME OF A STOCHASTIC EPIDEMIC—A NOTE ON 
BAILEY’S PAPER 


By P. WHITTLE 


Applied Mathematics Laboratory, New Zealand Department 
of Scientific and Industrial Research 


In a recent paper (1953) N. Bailey has considered a stochastic epidemic model of the type set up by 
Bartlett (1949), and has shown that the probability distribution (P,,) of the ultimate number of 
infected individuals (w) may be calculated by solving a certain set of doubly recurrent relations. 
I propose to show that for quite a general case these same probabilities may be obtained by the 
solution of a set of singly recurrent relations (eqs. (17) and (24)). Furthermore, an expression may 
be derived (eqs. (38) and (40)) for the probability that an infection introduced into a large population 
will *take'—this provides a stochastic equivalent to Kermack & McKendrick’s threshold theorem 
(1927 and later). 


1. INTRODUCTORY 


Following Bailey, we shall assume that an initial a infectious cases are introduced into a 

population of » uninfected but susceptible individuals, and that the probability that after 

a time t there are r susceptibles still uninfected and s infectious cases not yet removed 

is p,,(t). Let BsAt be the probability that an infectious individual is removed in the in- 

finitesimal time interval (t,t-- At), and let A,sAt be the corresponding probability that a 

new infection takes place. No particular form is assumed for the function A, at the moment. 
The development of the probabilities p, is then governed by the relations 


д 
airs = Ars m 1) 2.5, ate Bis + 1) p, $4177 (A,s 6 Вз) Р,» (1) 
which become ‘ 
(A+A,8+ Bs) grs = А„1(8—1)4„л „+ B(S + 1) %,541 05,0, (2) 
if the transformation © 
drs = f. e p, (f) dt (3) 
is performed. As Bailey observes, the probability of an epidemic of total size w is then 
ү d БЕ! (4) 
where ites = lim q, (5) 
۸+0 


This limit exists as long as s» 0, and the equations regulating the appropriate f,, are ob- 
tained simply by setting A equal to zero in equation (2), with s = 1, 2, .... These are, in effect, 
the equations used by Bailey for the computation of the P,. 


2. ESTABLISHMENT OF THE REOURRENCE RELATIONS 


If we write 
ho =0, А, = Sfp, (8 =1,2,...), (6) 
B 
«T REP " 
i Ara 
B. = A,+B’ (8) 


Р. WHITTLE 117 
then the equations for the f,, take the form 


hre = а,ћ, esa Php sp (9) 
Nag = anns + n) (r =n—1,n—2,...; 8 = 1,2,...). (10) 
n+ 
Now, let Hz) = Y hann. (11) 
s=1 


Multiplying (9) by z**" and summing over s we find 


23 
Нда) = ga, Pe Henn) —a,h,]. (12) 
А direct solution of (9) shows that 


ы = P Hs) (13) 
т 
as, indeed, it must, if the expression (12) for H, is to constitute a finite series in x. We have 
thus p 
нда) = P Hala) — Hasen (14) 
a relation which certainly holds for r = »— 1, n—2, ..., and also for r = n if we introduce 
a function 
х% 
Han) = Жек (15) 
n+l 
(cf. eq. (10)). Further, by (4), (6) and (13) we have 
h= pies Harn): (16) 
nw 


The required probabilities P, can thus be derived by solving the simply recurrent relation 
(14) with initial condition (15), and then using (16). By doing almost precisely this we shall 
obtain an explicit recurrence relation for the P,. With the help of (14) and (15) H, ,,44(*) 
can be expressed in terms of H, 20924032): Hy (Ona) Haa). Setting ж = a, ,, in this 
expression and substituting for the H,.,,(«,) from (16) we find 


E a. 

n-w p = a 
У Knut n-uKn-ut2, nae "55 Kn—w,n—u Eo Ка—и+1, n-uKn-ut2,n—u *** Mr Jat. 
we n—w 


(17) 
where A mH i 
Kp =, Ke БЕЛМ ("= 8). (18) 
The final relation of (17), for u = n, reduces to 
EP, =1, (19) 


provided that A, is zero, as it must be. 
Complications arise if any of the A,’s (and consequently the corresponding «,’s) are equal, 
When this happens P, involves not only Н, ,,(x) but also its derivatives. The extreme case 
in this direction is that for which A, is constant for r> 0 
A,=A (т=1,2,...) 
A, — 0, 


(20) 


118 The outcome of a stochastic epidemic 


as is approximately the case during the initial stages of an epidemic if the number of 
susceptibles is large. The appropriate modification of (17) may be derived by a repeated 
application of de l'Hópital's rule, and may be shown by induction to have a solution 

` Avpaee a(a + 2w — 1)! 
= (4+ By wl a+) | 


P,-1- XB, | 
2 | 


Р, 
(w = 0,1, ..., n — 1). (21) 


(Formula (21) is a generalization of one obtained by D. G. Kendall in a slightly different 
context, see formula (52) of his paper of 1948.) We shall return to this case later, and shall 
for the moment consider the more usual alternative 


A, = Cr, (22) 
where C is a constant. We have 
AE в OFFI) ) 
^ 7 врс ” BeOr* (23) 
EN Bir +1) Lar 1) 
" (B--Cs)(r-s)  r—s , 
and (17) becomes 
и (n—wY _ CORR e 
ul gu poss (o) ade OST, usnm). (24) 
For computational purposes it is convenient to consider instead of P, the quantity 
n —w) 
Q,- P n, (25) 
for which D ®в-и@ш _ Camu (26) 
w=0 (Uu—w)! u! 


3. THE PROBABILITY OF EPIDEMIO 


Let us now return to expression (21). We shall use this as a comparison formula for esta- 
blishing the behaviour of more refined models. Note, however, that the model upon which 
itis based is a perfectly valid one, which for quite large ranges of w is more realistic than that 
corresponding to assumption (22), since this assumption requires that the population mix 
homogeneously, a requirement never fulfilled in a large population. 

The following table gives the first few values of Р, as calculated from formulae (21) and 
(24) for the case a = 2, = 30, p = B/C = 30B/A = 10: 


The difference between the two sets of probabilities is small but increases with increasing t. 


Р. WHITTLE 119 


We shall now adopt the following definition: It shall be said that ап epidemic has (has not) 
taken place if the total proportion of susceptibles which become infected exceeds (does 
not exceed) a predetermined fraction y. With this definition the probability of no epidemic is 

ny 
7,2 Ў Po (27) 
w=0 


where P, is in general given by (17). ; 
Consider now two other models for which the infection intensities are given by Aj, Af. 
We shall assume that in all three cases the infection intensity is non-decreasing with 


increasing т: "Nn 
ASA} (т = 0,1,...). (28) 
А ы>А, 
We shall further assume that, at least for the range n5 75 n(1 — y), the intensity for the 
first model lies uniformly between those for the other two 


Als A,> Ay. (29) 
It is then intuitively evident that 
ny ny ny 
УР,<ЎР,< УР. (80) 
0 0 0 
Suppose now that the intensities for those two comparison models have the constant values 
AL=A 
ау | (r>0), (31) 
Ar E А-у 


while 44 = Aj = 0. Condition (29) is thus fulfilled, and the inequality (30) becomes 


ny 
У 5(А„) & T, < > S. (Asa) (32) 
0 
where we have used S,,(A) to denote the expression in (21). 
Consider now the partial sum Ў S, (A) as n becomes large. We have 
So AB (a+ 2: - 1) (a+ 20) _ 4AB if (a+1)(a—4) 
мао 2 ا ا‎ а A ЧӨ А 
S, (AFB) (0+1) (а+0+1) (A+B) 
= 4k ' (say). (33) 
The quantity 4k will be less than unity, except for the case A = B, which we shall exclude 
for the moment. We can thus write 
У S, (A) inv Ў S,(A) = R,,(A); (34) 
0 
where B(A) = S44) «So ( Do (48) + (4k)2-+...] = ОПА)". (35) 
nyti 
The infinite sum in (34) has the value 
о ОАВ J- [2221]. i 
5844) = E A+B 2A el 


120 The outcome of a stochastic epidemic 

Combining (32), (34), (35) and (36) we have thus 
A,+B—|A,-B| ke e qoe О 31 
[^ pecore ee] ол 


For large the remainder term in the first member of inequality (37) is quite negligible, 
and we shall no longer include it. Note now from (36) that we have ZS,(A) = 1 or (В/А)а 
according as A is less or greater than B. Setting these evaluations in (37) we find that there 
are at least three distinct cases: 


А„>В, Anu > В: (B/A,)*<7,<(B/A,a_y)*; 
А„>В, Aq „<В: (B/A,)*<7,<1, (38) 
A,«B, А „<В: п = 1. 
We may sum up the situation in terms of Bailey's removal ratio 
Pn = B/C = nB[A,, (39) 
the ratio of removal and infection rates for a population of size n: 
For p, <n and аа <n(1—y), the probability of epidemic lies between 


1= (2) and 1-( y: 
n n(1—y) 
For p, « n and p,q_,)>n(1—y), the probability of epidemic lies between (40) 


zero and - (e, 
n 


For p, >n and p, 7 n(1 — y), the probability of epidemic is zero. 


These statements provide an equivalent in the stochastic case to Kermack & McKendrick's 
threshold theorem in the deterministic case, at least for the case of large populations. 
Since for large т the ratio p„/n = В| А„ will tend to an almost constant value, statements 
(40) may be roughly condensed to 


For p, « the probability of epidemic is 1 — yer (41) 


For p, > n the probability of epidemic is zero. 


The transition case р, = n cannot be adequately treated by approximate considerations 
of the present type. Equation (28) indicates that the probability of completion is of fairly 
constant magnitude for small w, roughly of order (4)*. As the number of susceptibles 
diminish, however, the critical value of p will fall, and it seems likely that the epidemic 
will eventually be halted, although only after having made appreciable inroads on the 
population. 

The statements (36) are reminiscent of similar statements concerning the natural extinc- 
tion of populations (cf. Bartlett, 1946) and could have been derived by regarding the group 
of infected persons as a population with birth and death rates A [n and B. To reason in this 
way is unsatisfactory, however, since the condition that the infected group shall ultimately 
disappear provides no guarantee that infection will be confined to à preassigned fraction 
of the population of susceptibles. 

The argument above is probably more illuminating in the following intuitive form. The 
probability distribution P, of (24) presents two different forms according as А is less than 


Р. WHITTLE 121 


or greater than В (see Fig. 1). In both cases P, dwindles with increasing w, and if the popula- 
tion size is large enough to permit w to take large values P, will finally approach zero. In 
the case A < B the sum of the P, up to this stage will approach unity. In the case A > В this 
sum will have some value less than unity, say 1— æ, so that P, must have a finite value а 
if relation (19) is to be fulfilled. 


n small n small 


A<B A>B 
Lith Р. 
ny п w ny n z 
n large n large 
P, А<В р, A>B 
ny n w ny п w 


Fig.1. The appearance of the distribution curve P,, of the ultimate number of infected 
persons, on the assumption of a constant infection rate. 


For large n the probability of no epidemic 
2, = X844) (42) 
0 


initi : unity if A < B, wif A > B. 

will b al to the area under the initial part of the curve: unity i Я 

The fa that all probability mass which does not fall in the first J-shaped part of the 
curve falls at w = n indicates that either the epidemic keeps within bounds (probability 
= ++ infects the entire population (probability a). 

bes ed A show a similar, although less extreme behaviour. Thus the 
distribution curves calculated by Bailey are either J-shaped or U-shaped, depending upon 
the relative values of the removal ratio and the population size. The J-shaped curves 


122 The outcome of a stochastic epidemic 


correspond to cases in which the infection is almost certainly confined to a small proportion 
of the population. The U-shaped curves correspond to cases in which the infection strikes at 
either a small proportion or a large proportion of the population, the probabilities of these 
two alternatives being equal to the integrals of the corresponding limbs of the probability 
distribution. 

What our argument asserts in effect is that for a large population the form and integral 
of the first limb of the distribution P, is equal to that calculated on the assumption of a 
constant infection intensity. 


REFERENCES 


Barney, N, T. J. (1953). The total size of a general stochastic epidemic, Biometrika, 40, 177. 

BanrLETT, M. 5. (1946). Stochastic Processes. Mimeographed, North Carolina lecture notes. 

BanrrLETT, М. S. (1949). Some evolutionary stochastic processes. J. R. Statist. Soc. B, 11, 211. 

Kenpatt, D. С. (1948). On the generalized *birth-and-death' process. Ann. Math. Statist. 19,712 

Kermack, W. О. & МсКехрвіск, А. G. (1927 and later). Contributions to the mathematical theory 
of epidemics. Proc. Roy. Soc. A, 115, 700; 138, 55; 141, 94. 


i НӘБИ ЕЕЕ ҮЕ ИНЕШ СОТ 


[ 123 ] 


А NOTE ON BAILEY’S AND WHITTLE’S TREATMENT OF 
A GENERAL STOCHASTIC EPIDEMIC 


By F. б. FOSTER 
Division of Research Techniques, London School of Economics 


1. Introduction. Ina recent paper Bailey (1953) obtained a set of doubly recurrent rela- 
tions for the probability distribution (А) of the total size (w) of a stochastic epidemic. 
Bailey also quotes an explicit formula for P, due to the author but observes that this is not 
suitable for computation. In a note on Bailey’s paper, Whittle (1955)* showed that the P, 
could be more simply calculated from a set of singly recurrent relations. 

In the present note, Whittle's relations are considered. The notation is simplified by use 
of symmetric functions, and it is shown how a set of singly recurrent relations may be 
obtained for P, in a quite general case by use of a simple probability argument. Whittle’s 
relations may then be re-derived as a special case of these relations. 

Following Bailey and Whittle, we consider a population which consists initially of n 
uninfected but susceptible individuals and a infected individuals. At any time f, if there 
are r susceptibles and s infected cases, the probability of one new infection taking place in 
time dt is rsdt and the probability of one infected being removed from circulation is psdt. 
The epidemic ends whenever either all infected have been removed or the whole population 
of a+n individuals has become infected. If when the epidemic ends the number of new 
infections is w (i.e. not counting the original a infections), we say that the epidemic is of 
total size w, and we denote by Р, the probability of this event. 

2. The problem may be treated as a random walk if we fix attention on the fluctuating 
number of newly infected, £, present at any time. At any instant, Е may be either increased 
by unity or decreased by unity, and we consider the sequence of instants at which such 
changes occur. 

Let us now represent the behaviour of £ by the motion of a particle on a rectangular 
lattice, such that a move from (a, y) to (x-- 1, y) represents a unit increase in ë, and a move 
from (x,y) to (z,y-- 1) represents a unit decrease. Thus, at any instant, the 2-co-ordinate 


will represent the total number of infected cases (in addition to the initial a) and the y-co- 
at instant. We suppose 


ordinate the total number of removals which has occurred up to th 
that the probabilities of the moves from (x,y) to (x+1,y) and to (2, y* 1) are respectively 
А» Mn (Ag+ a= 1), which in general depend on =. For example, in Bailey’s problem: 


n-i Йе} : 
A T n — rp. Йе n—a+p 
The motion starts at (0, 0), and stops as soon as any point of the barriers, 
=n, у= 2+9, 
and we аге interested in the pro- 


is reached. This corresponds to the end of the epidemic, 
...%—1). In the general case we 


bability P, that the particle stops on (w,a+w) (w = 0,1,. 
n-i 
еши P,=1- È Po 


* See pp. 116-22 above of the present issue. 


124 Bailey's and Whittle’s treatment of a general stochastic epidemic 


Thus we have a Birth-and-Death process with the parameters А, /,, and the novelty 
resides in the fact that these parameters are position-dependent. It will be noted, however, 
that they are assumed independent of the y-co-ordinate. We study the problem for quite 
general parameters of this type. 


3. We now introduce some notation for symmetric functions. Denote by /,(x,m) the 
homogeneous product-sum of weight / in the (m+1) quantities, Hr, Hs .... [lem (cf. 
MacMahon, 1915). When these quantities are assumed distinct, define 


A(z, m) = WEE Mery Ha) <+- (ar¬ а-л) (Mag — Harj) oo (Ao — Ham): 


m 
Then we have the formula h(x, m) = ‚5 h(x, m). 
Now define qv, m) = А.А... Аал, m), 
and P(t, m) = А„А„ л... А1706, m). 


In this notation, it may be verified that Whittle's set of recurrence relations becomes 
k 
Ty: 2 =i) Bs = 0000,0) (ko 0,1,...,n—1). 


4. As Whittle observes, in these relations we have to assume that the Ёз are all distinct. 
We proceed now to derive the general formula, using a simple probability argument. 

When no barriers are present, the probability that the particle attains the point 
(a +m, y +1) from the point (x, y) is q(x, m). Now P is the probability of attaining (k, a +h) 
from (0,0) by any path below the barrier, Therefore 


Р, = Yai 0, E) - q(0, k) Py — q . (1, k= 1) P, ...—q (E — 1,1) Py. 


We may rewrite this as the set of relations, 
k 
Pri У 40—40, = 40,0) (Б=0,1,...,п—1). 


This formula is valid quite generally, and Whittle's formula may be obtained from it 
by use of the following relations, which are easily verified: 


k 
py = D Ple ii) rei (k =0,1,...,.2—1). 


5. As an example, we consider the special case where the A's and j/'s are constant: 


A,8À, д„=д. 
Then q(8,t) = А 179. 
t 
Th S Auy(?* kar 2k 
ue O7) FS А) (= 0,1,...,0=1), 


which has the solution P, = Акан 0 e "d 


F. С. FOSTER 125 


It is interesting to note the connexion between the stochastic epidemic problem and 
results related to the arc sine law in the theory of fluctuations in coin-tossing (cf. Feller, 
1950, p. 252). Thus the case of constant A, и may be applied to the tossing of a biased coin 
with probabilities A, г of heads or tails. Then P} is the probability that the number of heads 
is greater than or equal to the number of tails plus a for the first time at the (2k + a)th trial. 
The probability "ci 
Р„=1— X, Р, 


is interpreted as the probability that in (2n + a — 1) trials the number of heads is always less 
than the number of tails plus a. For the particular case a — 1, А = р = }, we have thus 


n-1 


1 1 2k+1 
в,=1-—, agn or ) 


andit may be verified that this equals (2) , in agreement with the formula of Theorem 1 


in Feller (1950, p. 252). 
REFERENCES 
BAILEY, Norman T. J. (1953). The total size of a stochastic epidemic. Biometrika, 40, 177. 
FELLER, WILLIAM (1950). An Introduction to Probability Theory and its Applications. New York: 
John Wiley and Sons. <=. 
MacManox, P. A. (1915). Combinatory Analysis. Cambridge University Press. 
WHITTLE, P. (1955). The outcome of a stochastic epidemic—a note on Bailey 


42, 116-22. 


*s paper. Biometrika, 


[ 126 ] 


THE DETERMINISTIC MODEL OF A SIMPLE EPIDEMIC 
FOR MORE THAN ONE COMMUNITY 


By S. RUSHTON Ахр A. J. MAUTNER 
Imperial College, London 


1. INTRODUCTION 


In general, a deterministic model can be expected to give a satisfactory picture of a devel- 
oping process such as the spread of an epidemic, or the growth of a population, so long as 
the numbers of individuals are sufficiently large. On the other hand, if the numbers are 
small the continuous description is no longer valid; moreover, the effects of chance occur- 
rences become appreciable in any particular instance, so that a probability treatment 
is necessary. The literature on deterministic and stochastic models for epidemics has 
been reviewed by Bailey (1952); to the references given there should be added Bailey 
1953 a,b). Much of this previous work refers essentially to a single community or to a set 
of isolated communities. In the present note we consider the deterministic model of a 
simple epidemic for several related communities. A simple epidemic is one in which it is 
assumed that infection spreads only by contact between individuals, and that none of the 
infected individuals is removed from circulation by death, recovery or isolation. We do 
not suggest that the model we discuss here is a realistic one in the sense that any actual 
epidemic has followed precisely this over-simplified pattern, but all epidemic models so 
far discussed have lacked realism in some sense. We do suggest, however, that the model 
discussed in this note can provide a basis for further work along these particular lines so 


that more realistic and therefore more complicated models can be treated by starting from 
our basic solution. 


2. THE DIFFERENTIAL EQUATIONS FOR Mm COMMUNITIES 


We denote by y;(t) the number of susceptibles in the ith community at time t and by ж 
the total size of the ith community (i = 1, ...,m). Then, on the assumptions (i) that there is 
homogeneous mixing within each community with c; as the internal infection rate in the 
ith community and (ii) that there is also homogeneous mixing between communities with 
Ê, ав the infection rate between the ith and jth communities, the differential equations 
for the y;'s are dy 

{ 


а ism yo У Bislny—y,)} (i = 1,...,m). (1) 


Under the simplifying assumptions that n; =n and a, = о (all i), and that Ёз = ау 
m tj) and taking at as the time-scale and continuing to denote it by t, these equations 
come 


dy; _ ' : 
di = 009-0) +7 X Ay) (i = 1,...,m). (2) 
The parameter y in these equations is then the ratio of the assumed common ‘cross- 


infection rate’ between communities to the common internal infection rate a, which takes 
account only of internally generated infection. The solutions of the set (2) are completely 


S. RUSHTON AND A. J. MAUTNER 127 


determined by m independent initial conditions, e.g. the m values y{0) (i = 1, ...,m), and 
it follows that if any set of these initial values are equal then the corresponding functions 
y,(t) are identical for all t. In particular, if all the y,(0) are equal, the set (2) reduce to asingle 


equation гуаг -0 n Boc» 


for y(t) = y(t) = ... = ya (t). This is just the equation for a simple epidemic in a single 
community with total infection rate a(1-- (m — 1) y). 

Another particular case, whose solution is considered below in §§ 4 and 5, occurs when 
(m — 1) of the initial values y,(0) are equal and the remaining one, say y,(0), is not equal to 
the others. For example, this case arises if a single infected person appears initially in only 
one of the communities (say the first), so that 


0100) =n—1 and y4(0)- 2300) =... = Yml0) = 0- 
The equations (2) in such a case reduce to the pair of equations 
дуй = — уп у. (m — 1) y(n — 02), | (з) 
у] = — руву) + (1 + (т— 2) у) (n — 93). 
since y(t) = Yalt) = ... = Ym(t) for all t. 


3. A GENERAL SOLUTION 


If the initial values y,(0) (i = 1, ....m) are all distinct we can obtain a solution of the equa- 
tions (2) as follows. Writing equations (2) in the form 


Wi dy, ty Xn) (i= Lem) (4 
j*i 
where а = 1 + (m.— 1) y, and making the transformations 
y; = NY; 8 = nl, Yep (5) 
we obtain the equations : 
BAU; gee (U,+y EU) G=) (6) 
U; ds j*i 
Changing the independent variable to 
v = (1—е-%)|а, (7) 
these equations become 
1 dU, _ 4.9, 20,47 EU, = 1-0). (8) 
Uy, de qe = i+7 5 A 


Denoting the matrix of coefficients on the right of these equations by A, the vector of 


elements U, by U, and denoting à column vector by { }, this set is 


d 
150080) = AU. (9) 
It follows that d i Ж 
- sz e a" Lj 10 
j^ log U} = as (oe П, U3 | U, (10) 


where, corresponding to А = [ау], we denote А-1 = [а]. 


128 The deterministic model of a simple epidemic 


т 
Now, putting X,- П Uf, (11) 
so that U= LX”, (12) 
j=l 
the equations (10) become 
d пахі э ats Ja 13 
aoo ج‎ ~ i ar- (Rx) n E 
for i = 1,...,m. We then have 
dX uc 
gj 5 = = 14 
A (fx) F(v), say, (14) 
for all i. Integrating these equations we obtain 
X,-(a,-G)30-» (i—1,...,m), (15) 
v 
where G(v) = (1 -n f; F(v)dv, (16) 
and a; = (X,(0))71. (17) 
From (12) and the definition of F(v) in (14) we have 
m Y 
= (f xı) x (18) 
=1 
= F(v)/(a;—G) (i- 1,...,m) (19) 
from (15). 
Since, from (14), (15) and (16), 
m Уу-у) 1 dG 
то [П e-o] a, (20) 
we may obtain an integral expression for v as 
G Ди 
е = (ü («,—G) aa. (21) 
—YJo li=1 


By means of (19) and (21), using the expression for F(v) in terms of G in (20), the solutions 
for U, and the time parameter v are given parametrically in terms of 6. 


4. THE SPECIAL CASE OF EQUATIONS (3) 
Tn the special case mentioned at the end of $2 in which y,(0) =... = y, (0) and y,(0) + y4(0), 
the general equations (2) reduce to the pair of equations (3). 
We see from (5) and ( 18) that these conditions imply that 


040) = 000) =... = U,(0), U(0)4 U,(0) 
and Х4(0) = X40) =... = Х„(0), X,(0) + X,(0). 
Defining (X,(0))"1 = q, (22) 
-l1 ے‎ E dr 
it follows from (16) that Кел ае ов Sat 
U, = F(v)|(a-@) (24) 
and О, -F())(-G) (i =2, ... 7). (25) 


* 


S. RUSHTON AND A. J. MAUTNER 129 

Now put 2= 010, W=U,-U, (26) 
so that Us = 0101-2), U,-WZ|(—Z) (27) 
and note that if y.(0)>y,(0), which we may take to be the case, then y,(t) > y,(t) for all t, 


and it follows that U, > U,, so that W > 0 and 0 € Z < 1 for all t. Also we must have a > £. 
Expressing W in terms of Z by means of the above equations and the expression for F(v) 


in (20), we obtain W = A(1—Zy]Z«-1y,, (28) 
where A = 1/(1— y), и = 2-- (m — 2) y and A is an arbitrary constant = (x — 8)^4-%. 
From (20) we have do ^ 
dG Fw) 
It is easily seen that this is equivalent to l 


so that, corresponding to (21), we may express v in terms of 7 аз 
z 
"n B-^ Í (1— 2012-14412, (30) 
0 


where B is a second arbitrary constant. The integral in (30) is an Incomplete Beta Function 
which сап be expressed in terms of the hypergeometric function. In practice, however, it 
is preferable to evaluate the integral in (30) directly. 


5. PRACTICAL FORM OF THE SOLUTION 
Considering the practical form of the solution of $4, we note that if we choose à new time- 
pene T = (1+(m-—1)y)v = (1—е-@+-%в), (31) 
where we recall that we have put s = nt, then 0 < T < 1 and the whole history of the epidemic 
is shown on a standard time scale extending over a unit interval. Equations (28) and (30) 
constitute the solution to our problem, the solution being expressed parametrically in terms 
of Z = UU, = ҮҮ, = yilya- 


Defining 002) = {(1— Z)“ |Z}, (32) 
then, from (28), (30) and (31), 
W = Ag(Z), > (33) 
and T = С-и) | (1/WZ)d£, (34) 
f 35 
= 0-0-14) | ааа), (35) 
The initial values of W апа 7 are 
W, = U,(0) — U,(0) = (0) – (0) = (0200) —yx(9))/, (36) 
and Z, = y,(0)/y9(0). We notice incidentally that if ,(0) = n, so that initially there are no 
infected persons in any community other than the first, then Ж, = 1 — Wo. 
The arbitrary constants А and C are clearly given by 
A = Wyg(Zo) (37) 
and o = Au- D/A) | A22. (38) 


Еу Biom. 42 
9 Р 


180 The deterministic model of a simple epidemic 

From equations (5) and (31) the proportions Y, and Y, of susceptibles in the com- 
munities at any time are 

Y, = (1-7) 0, 

= (1-7) W/(1—Z) = A(1—T) g(Z)/(1—Z) (39) 


in virtue of (27), and Y, = ZY, (40) 


Table 1. Numerical procedure: y = 0:2, m = 2, Zy = 0-95 


E E TT aaa 
Z dz dY, 
2 92) f: ZZ) y 8 Y, Y, as 

0-95 5-96 x 10-4 57-03 0-000 0-00 1-000 0-950 0-010 
0-94 9-53 x 10-4 42-94 0-252 0-24 0-996 0-937 0-016 
0-93 1-42 x 10-3 33-73 0-417 0-45 0-992 0-923 0:023 
0-92 2-01 x 10-8 27-32 0-531 0-63 0-987 0-908 0-031 
0-91 2-73 x 10-8 22-65 0-615 0-80 0-981 0-893 0-039 
0-90 3۰61 x 1073 19-13 0-678 0-94 0-975 0-877 0-049 
0-89 4:64 x 10-3 16-40 0-727 1-08 0-968 0-862 0-057 
0-88 5-85 x 10-3 14-23 0-765 1-21 0-959 0-844 0-069 
0-87 7-25 x 10-8 12-47 0-797 1:33 0-951 0-827 0-080 
0-86 8:85 x 10-3 11:03 0-823 1:44 0-941 0-809 0-092 
0:85 1:07 x 10-2 9-83 0-844 1:55 0-930 0-791 0-104 
0-84 1:27 x 10-2 8-81 0-862 1-66 0-915 0-769 0-120 
0:83 1:50 x 10-2 7:95 0-878 1:75 0-907 0-753 0-129 
0-82 1:76 x 10-2 7:20 0-891 1-85 0-893 0-733 0-143 
0-81 2-05 x 10-2 6:56 0-903 1-94 0-880 0-713 0-157 
0-80 2:36 x 10-2 5:99 0-913 2-03 0-865 0-692 0:170 
0-75 4-48 x 10-2 3-99 0-949 2.47 0-774 0-580 0-240 
0-70 7-70 x 10-2 2-81 0:970 2-91 0-654 0-458 0-297 
0-65 1:24 x 10-1 2-05 0-983 3-41 0-500 0-325 0:318 
0-60 1:92 x 10-1 1-53 0-992 4-08 0-301 0-181 0-260 
0:55 2:87 х 10-1 1-16 0-999 5-89 0-046 0-025 0-052 


In terms of Y, and Y, and the time parameter s (0 < в < oo), where 


s = (log, (1-T)}/(1 + (m — 1) y), (41) 


the epidemic rate —dY,/ds in the first community and —dY,/ds in the other communities 
are given by 

Фів = ҮҮ, (m — 1) yY,- (1+ (т— 1) y)), | (42) 
Walds = -Y (yY, + (1+ (m—2)y) Y,— (1+ (m— 1)y)), 


and the total epidemic rate is — (dY,/ds + (m — 1) dY,/ds). 
For communities of total size n the actual epidemic rates are just —dy,/dt = —n*dY,/ds. 
The numerical procedure in applying the solution is illustrated in Table 1 for the simple 
case of m = 2 communities with y = 0-2 and initially 5 % of infected individuals in the first 
community and none in the second. In this case ¥,(0) = 0-95, ¥,(0) = 1, W, = 0-05, Z, = 0-95, 


S. RUSHTON AND A. J. MAUTNER 131 
A = } and и = 2. The arbitrary constants are A = 83-8712 and С = 1-0199. The principal 
part of the numerical work is the computing of g(Z) and the integral 15 dZ|(Zg(Z)). In this 
example, the epidemic is effectively completed by the time that Z Мм decreased in value 
to 0:55, when 7 = 0-999, and the values of g(Z) and n dZ|(Zg(Z)) are therefore only shown 


for 0-952 Z2 0:55. If the epidemic rates — dY,/ds and —dY,/ds are plotted the curve for 
— dY,/ds is at first below that for — dY;/ds as we should expect, crosses the latter at about its 
maximum, rises to a slightly higher maximum and remains above the curve for —dY,/ds 
until the completion of the epidemic. The curve for the total epidemic rate falls somewhat 
more slowly from its maximum than the rate at which it rises. We have examined a number 


1:0 

0-8 

06 
Yı 


04 


02 


Fig. 1. (Y, Уз) curves for m = 2 and у = 0:2 and different initial values: 
Y,(0) = 0:99, 0-97, 0:95, 0:90, 0-80, 0-70 and 0:50 and Y,(0) = 1 in all cases. 


of cases, for different numbers m of communities and different values of y, and this last 


feature appears to be generally true and is particularly noticeable for small m (e.g. T. 2) 
and reasonably large values of the initial amount of infection in one community үе res 
10%) before infection in that community begins to affect susceptibles in t е ot = 
This calls to mind the remark of Ross (1916): “It is obvious from the mere examination о 
many curves of epidemics that they are often remarkably symmetrical bell-shaped curves 
which, however, frequently tend to fall somewhat more slowly than they rise... 

A method of illustrating the solutions of the epidemic equations is shown in Fig. 1, where 
for different initial conditions (Y, Уз) curves are plotted for the case m = 2 and y = 0- 2. 
The curves shown are for Y,(0) = 0:99, 0-97, 0-95, 0-90, 0-80, 0-70 and 0-50, and Y,(0) : us 
all cases. Contours for the time parameter 8, giving the times that points on the ( 1 2 
curves are reached from the start of the epidemic, and contours showing the value of the 


total epidemic rate as the epidemic proceeds can also be put on this same graph. To avêid 
9-2 


182 The deterministic model of a simple epidemic 


confusion they have not been included in Fig. 1, but the latter are in fact conics. In this | 
partieular example, we see from equations (42) with m — 2, that the contours 
— (JY, [ds + dY,/ds) = k 
are concentric ellipses 
Yî + 2yFYs + Ү?— (1+ y) (Ү,+Ү,) = k, 


having their common centre at the point Y, = Y, = 4, their minor axes along the line Y, = н 
of lengths y{(1 +7 + 2k)/(2(1 + y))} and their major axes (perpendicular to Y, = ү) of 
lengths equal to J((14-y)/(1 — y)} times the corresponding minor axes. 


REFERENCES 


Banery, N. T. J. (1952). Appl. Statist. 1, 149. 
Barney, N. T. J. (1953a). Biometrika, 40, 177. 
Baney, №. T. J. (19535). Biometrika, 40, 279. 
Ross, В. (1916). Proc. Roy. Soc. A, 92, 204. 


[ 133 ] 


EXACT TESTS FOR SERIAL CORRELATION 


By E. J. HANNAN 
Australian National University, Canberra, A.C.T. 


1. INTRODUCTION 


In testing for first-order serial correlation in a stationary time series it is customary to use 
some form of the first sample serial correlation coefficient. Though this statistic will, 
presumably, be a most efficient estimator of the serial correlation p it has two disadvantages: 
(a) its exact distribution is very complicated so that approximations have to be used; 
(b) it is a biased estimator and the analytical form of the bias is not known. 

Ogawara (1951) has obtained an exact test for serial correlation by considering the 
conditional distribution of the ty for fixed values of the z,, ,. His statistic 6, is an unbiased 
estimator of 2р(1 + p?)-1, where p is the parameter of a simple Markoff process. At first sight 
it seems obvious that the resulting estimator of p will be inefficient. However, in this paper 
it will be shown that its efficiency tends to 1 as p tends to 0. It follows that Ogawara’s 
statistic leads to an exact test of the hypothesis which is asymptotically fully efficient (this 
criterion is defined and justified below). Computationally the test is as simple as that based 
on the first serial correlation coefficient. 

Another statistic, Ö,, is available which is also an unbiased estimator of 2p(1-4-p?)1. 
This statistic is the coefficient of regression of the =, on the neighbouring £y. The efficiency 
of the estimator of p based on the statistic 1), +6.) is here shown to be (1— p?). While no 
exact distribution theory is available for this statistic it could be useful in circumstances 
where an estimate of a common serial correlation coefficient is required from a number of 
otherwise differing simple Markoff processes, since it is also an unbiased estimator of 
2p(1 +p). 

Ogawara also gave the extension of his results to the case of a multiple Markoff process 
of order h. In general, however, it appears that the efficiency of his estimates of the coeffi- 
cients of the autoregressive equation will not rise above 2/(h+ 1), so that only in the first- 
order case are they ever fully efficient. 

The problem of testing for serial correlation in the residuals from a regression equation 
has been considered by Moran (1950) and Durbin & Watson (1950, 1951). Moran obtained 
asymptotic formulae for the mean and variance of the first circular serial correlation coeffi- 
cient computed from the residuals from a least-squares regression on a single independent 
variate. The exact distribution of the test statistic used by Durbin & Watson is of а com- 
plicated form and is only known for regression vectors satisfying certain conditions. How- 
ever, upper and lower bounds to the significance points, valid for any regression vectors, 
have been tabulated. In cases where the test statistic falls between the appropriate bounds 
an approximate test was suggested, based on the use of a beta distribution with the same 
mean and variance as the test statistic. 

In this paper it will be shown that Ogawara’s method can be extended to give an exact 
test, for the independence of the residuals from the regression equation, based on the z 
distribution. Again, a statistic is obtained which is an unbiased estimator of 2p(1+p?)* 
in the case where the residuals follow a first-order Markoff scheme. The estimator is, as p 


134 Exact tests for serial correlation 


tends to zero, asymptotically as efficient an estimator of p as the first serial correlation 
coefficient of the residuals and Durbin & Watson’s statistic. It again follows that the 
statistic leads to a test which is asymptotically fully efficient. 

At the same time an exact test against given values of the regression coefficients is 
obtained in the case in which the residuals follow a simple Markoff process. 

One disadvantage of these tests results from the necessity of computing a separate least- 
Squares regression involving 2k+1 regressors, where k is the number of ‘independent’ 
variates in the original regression equation. The estimates of the coefficients in the original 
equation, obtained from this regression, will, under some circumstances, have an asymp- 
totically smaller variance than the estimates obtained by straightforward least squares. 
However, over a wide range of these conditions the estimates of the regression coefficients 
obtained from the first differences of the observations will have an even smaller asymptotic 
variance, and over this range at least the variances of these last estimates will almost 
certainly be smaller for small samples. 

Finally, it should be emphasized that, though certain exact tests have been obtained, the 
powers of these tests have béen judged from their asymptotic properties. For really small 
samples these tests may be far from optimal. 


2. THE CRITERION OF ASYMPTOTIC RELATIVE EFFICIENCY OF TESTS 


If t, and £, are statistics, computed from a sample of size n, their asymptotic efficiency as 
tests of an hypothesis specified by a parameter value 0, is defined as 


H(t ts 69) = lim ا‎ | veloso " 


Here 4 (£j) and V(t,) are respectively the expected value and variance of t;. 

The justification for this criterion rests upon a theorem due to Pitman (see Stuart, 1954). 
Pitman has shown that, under certain regularity conditions, for t, and t, with limiting normal 
distributions and variances of order n~}, the ratio of the sample sizes required for t, and ty 


to have the same power against alternative values of 0 which differ from 0, by quantities 
of order n is in the limit given by E(t,, t | 00). 


3. THE TEST FOR SERIAL CORRELATION IN A STATIONARY AUTOREGRESSIVE PROCESS 
Ogawara considered the stochastic process (z,—m) = 
distributed with 0 mean and variance a*(1—p%), 
|p|<1. 

He showed that in the conditional distribution of the yi (t = 1,...,n) for fixed values of 
y, (t= 1, ...,m +1), the parameter b = 2p(1-- p*)-* appears as the regression coefficient 
of the xy on the fixed variates Vy, +%y,1) = aj. Ogawara pointed out that the statistic 


p(x,., — т) + €, where e, is normally 
and m is the mean of the process. Here 


Ў (x-7) (ty — 23) 


"Sez 


E. J. HANNAN 135 
is then the maximum-likelihood estimator of b, and is unbiased, and that 
n" e; s 
(6, 6) X (xi - 2 
F = „~ (n- 2) 
У(аы-2-6,20) 
which has the F distribution with 1 and n — 2 degrees of freedom, can be used as an exact 


test for any assigned value of b. Here @ = %,—6,2'. 
For the test of significance of the hypothesis р = 0 the statistic F reduces to 


(n — 2) *(1 —т®)-1, 
where r is the coefficient of correlation between £y and J(zy 4 + ,,,). This test is, therefore, 


computationally as simple as that based on the first serial correlation coefficient. 
The variance of 6,, in the conditional distribution of the zy, is 


e*(1— р?) YS 
pie Baa 
P $e-ry 
and since this converges in probability as n increases to 
0%(1—р2)4  — 2(-p) 
(1+p*) 2no%(1+p*) nle 
the conditional variance of Ô, also converges in probability tot 


2 l-e 
1 n (1+ p?" 
Given б, we can estimate р by 
„ ,1-40-8) 
er Ê y 
1 


This is not, of course, unbiased. Its variance will be, in the limit, 
21-g (1+0) 1p» 
п (1р2) 401—022 2n (1-0) 
The variance of 7,, the first sample serial correlation coefficient, will tend to 3(1— pn 
(Bartlett, 1946), and since it seems that this will be asymptotically most efficient (Wald, 
1948) the efficiency of Ogawara’s estimate, Pr will be 


l= p? 2 
( 2 І 
The efficiency of f, for certain values of p is shown in Table 1. 


Table 1 


|p| 


Efficiency of f, 


n a 
f For апу observed Ê, the variance, for fixed Van will be o7(1 -Pfa +°) X (а; ==) of course. 


we 


136 © Ехасі tests for serial correlation | 

Another unbiased estimator of b is available in the statistic 
" ыс 
X (zi —2") 24 

б, = Ac um Sx. 

2 (z; -2")? 

P , 

where 2; = (tay + tu). 3 
The statistic 6 = }(6, +6,) will also be unbiased and its variance will be 


Ears {var (eov (ҺЫ) = GPE +1 cov (Аб). 


To the order n-~! the sample means can be neglected and the covariance of 5, and 6, 
will be (to this order): | 
1 р? p 
[y COV (a, a5) + a+, COV (сс) — a+ {cov (a, ¢,) + cov (с,а,)), 


1 


n І "-1 
where а= тїп Da naa +), Lg zr X ope (oy + taa), 
1 


1 n-1 
G LP (a-r tta) = ой» > (Жы + Ta). 
By straightforward algebra we obtain, to the order n, 
j2 4. 956 2 2 4. 6. 8 
sov (f,6,) = 2 | 1 Е +100 + 6р%— 2р | p s. 440p! 8р6 — 8p | 


^ +e E (1+p?)4 1-pi 


‚КУУР, [72 20p? + 4р5 — | 
(1+p?) 


1—р% 
The asymptotic variance of б, therefore, is 


var (û) lar] А 


Hence, the variance of the estimator Ê = [1-V(1—62)] 6 tends to (2n)-1, so that its 
efficiency is (1 — 02). 


Table 2 shows the efficiency of this statistic for certain values of p. 


Table 2 


le] 0-2 0:3 0:4 0:5 0-6 


Efficiency of û . 0-96 0-91 0-84 0-75 0-64 0-51 0-36 0-19 


Though no exact distribution theory is available for this statistic, yet, in spite of its 
inefficiency, it could prove useful in circumstances where an estimate of a common serial 
correlation coefficient is required from a number of otherwise differing simple Markoff 
processes. The lack of bias in 4(6, + Ö,) would enable the several estimates to be combined 


T An example (due to E, J. Williams) has arisen in connexion with the serial correlation of measure- 
ments at points along a log. The averaging of estimates from different logs, the only method of sub- 
stantially inereasing the number of observations, would not result in a consistent estimator if the 
estimates were obtained from the first sample serial correlation coefficient. 


t 


E. J. HANNAN " 4 137 


and an estimator of p to be obtained which would be consistent. The loss of efficiency, for 
moderate p, would not be large. didi чём 

The statistics 5,,.5, and b = 4(6,+6,) have one major disadvantage however. The use of 
the formula û, = [1— (1-б, 
will lead to a consistent estimator of the first serial correlation coefficient only if the under- 
lying process is truly a simple Markoff process. 

The fact that the efficiency of 5; tends to 1 as p tends to 0 is important, however, for this 
shows that the test of significance based on b, (j = 1,2) will be asymptotically fully efficient. 
Though nothing certain can be said, from this result, about the power of the test in small 
samples it seems likely that it will be nearly as powerful as that based on r, against the range 
of alternatives represented by simple Markoff processes. The test based on b, has the 
advantage of being exact. 

Ogawara extended his results to а multiple Markoff process of order A. In this case the 
regression of the variate з, on the variates 25, = Barry + a+) for p = 1,..-,h 
is considered. In the first-order case the basic test statistic is the correlation coefficient 
between ty and 3(xs, 4 + Жы), and the numerator of this statistic (apart from the factor $) 
differs only in the corrections for the means from the numerator of the first sample serial 
correlation coefficient.'The corresponding test statistics in the general case are the total and 
partial correlation coefficients between ууу and the tp, Now, however, certain cross- 
products are omitted as compared with the total and partial sample serial correlation 
coefficients so that the power of the tests is reduced. It seems that the efficiency of the 
estimates of the coefficients of the autoregressive equation (from which the asymptotic 
power of the tests can be gauged) will be at a maximum when all of the coefficients are zero. 
In this case the efficiency is 2( + 1)71. In general, it will be lower. For example, the efficiency 
of the estimate of ау, in the process 2;-- 212,1 +431, 5 = 6j in саве d, = 0, is 3(1 +03). 
It appears, therefore, that although exact tests for partial serial correlation are obtained 
by Ogawara’s approach these tests will be less powerful than those provided by the usual 


serial coefficients. 


4, THE TEST FOR SERIAL CORRELATION IN THE RESIDUALS 
FROM A REGRESSION EQUATION 
Consider the regression у = y +f, %y+fo%y+--.+ Pa 6p 
where (1) e, = pe, +h and m is N(0, o(1 —p)t); |p| «1 
(2) The e, are independent of the ay. 


Then the conditional distribution of the y; for fixed a; , and fixed вц. (t = 0, ...,n) is 
1 a. 2k+1 2 П Р 

вү—{% en fae З 

(270%) *" exp 50% (24 (v. Yo à А) ! Yor 


| ; 
н ой = au, љан 
у= Ё; and 2, = tju (LE. 
Ухт = a and- zg = Mua + Yat)» 
I —Yr f; and 2= Eju- + %j, 0141) (j = k4 2, ...,2k4 1). 


188 Exact tests for serial correlation 


If the information involved in the knowledge that Yj... = —ypaiy; ( = 1,...,) is 
discarded and the coefficients у, are estimated by the straightforward least-squares pro- 
cedure, these estimates will have the usual properties of least-squares estimates in the classic 
case. They will not be maximum-likelihood estimates. However, the computation of the 
maximum-likelihood estimates will involve the solution of a system of second-order 
equations and no exact tests will be available. 

Representing the least-squares estimates of the y; by ĵ;, the hypothesis p = 0 can be 
tested by the use of the statistic 

6 Ў | L | 
Fin-ox-2 = (%—2Ё— 2) жт mr 
which has the F distribution with 1 and n — 2k —2 degrees of freedom. Here o*? is the 
variance of the residuals from the least-squares regression of Yy on the z; while L is the 
covariance matrix of the гл: | Ly, у. | is the cofactor of the element in the indicated row 
and column. The test is, of course, equivalent to a test of the partial correlation of y» and 
30751 + 7л) With the effects of the Za (j #Ё+ 1) removed. 

Tests of significance and confidence limits for the parameters 2; can be obtained from 

statistics such as 


Fy naka ==; (i 26-2), - y ) i (7 = 1,...,Ё), 


which have the F distribution with 1 and (n—2k—2) degrees of freedom. Similarly, the 
multiple correlation of y and the x; may be tested by the statistic 
E a CELA 
ba UE. P ЖУ» [А | 
Here A is the matrix formed from the first Ё rows and columns of 1-1. 
These last tests are exact for any value of p, though the power of the tests will depend 
upon that parameter. 
, In order to examine the limiting variances and covariances of the estimates of the y; i 
18 necessary to obtain the stochastic limit of the matrix L. 


If the vector {гу} is written Z, = {2,2311 Zo} and the corresponding vector of sample 
means Z = (Z,, 2441» Zo}, then the covariance matrix L is 


Jora 


(2—7) Zi (Zu- Zi) Zrt (Zu -Z,)Z;, 
nd Gau еы) Zu (ааа Zea) pt (Zesty Zsa) Z4 | < 
(Z4 Z) Zi, (Zy —2,) 26414 (Z,— 7.) Ly 


L= 


Since аны = «+ Zu Mes, + 69,4), where B is the vector of regression coefficients, 
under fairly weak restrictions on the nature of the processes generating the xy, this matrix 
will have a limit in probability of the form 


X, XB X, 
PX; 40%(1+p*)+6'X,8 вх, |, 
X X28 X, 


where X,, X, and its transpo 


2 se and X, are the stochastic limits of the matrix in the corners 
of L. 


E. J. HANNAN 139 


М Since the determinantal value of L will be almost everywhere different from 0 (if the 
joint distribution of the 2, is not singular) the matrix L~ will converge in probability to the 
inverse of the matrix just written. 

Since the covariance matrix of the $, (for j > 0) is of(nZ)-, it is readily seen that the 
variance of 2,,, converges in probability to 


3 Des 
n (14-p3)*' 


Hence the variance of the estimator of p obtained from this statistic is 


1 (1492) 


2n (1—p?)’ 


as in the case of the estimation of the parameter of a simple Markoff process by Ogawara's 
method. 

The test statistic, d, used by Durbin & Watson, is asymptotically equal to 2(1— r,), 
where ғ; is the first serial correlation coefficient computed from the actual residuals from 
a least-squares regression. It can be seen from the formulae (5), (6), (7) and (8) given on 
р. 164 of their paper (1951) that the variance of this statistic (when р = 0) will be, asymptotic- 
ally, 2n-! when computed from 2n observations. It follows that the estimator of p obtained 
from 4,,, is as efficient as that gained from the statistic d, when p — 0. The asymptotic 
relative efficiency of d and ,,, against the hypothesis p = 0 is therefore unity. It is easy to 
show that the maximum-likelihood estimator of p will have an asymptotic variance (2n) 
(from 2n observations) when p equals zero. 16 therefore appears that d and 5..1 lead to tests 
of the hypothesis, p = 0, which are asymptotically fully efficient. 

The covariance matrix of the 2; (j = 1, ..., k) converges in probability to 


a 
n 1+ م‎ 


where X is the matrix in the top left-hand corner of the inverse of 


Xo xl 
Xi 
If the a, are serially independent the covariance matrix of the $; (j = 1,.- 


in the limit 
g*l1—p'xa 


m 1+p* "S 


. k) becomes 


ased estimators of the /;. The covariance matrix of the straightforward 


But these $, are unbi 

least-squares estimates of the f is, in this very special case (Wold, 1953, p. 213), 
[, 1989 
эң Хо . 


etwo methods is then 4(1 +p?) (1—p?)*. 


The relativeefficiency of the estimates of the 2; by th 
be more efficient than those from least 


The estimates obtained from the 9; will, in this case, 
squares when p4/3 > 1. 


140 Exact tests for serial correlation 


In general, the relative efficiency of the two methods will depend upon the correlogram 
of the zy as well as p. For example, in the case where there is only one regressor, which 
follows a simple Markoff process with parameter p}, the variance of 2, will be, in the limit, 


e? (1— 1+ 1 
ЕКЕ Е 
where тї is the variance of z}. 
The variance of the straightforward least-squares estimate from 2n observations will then 
tend to (Wold, 1953, p. 211) 
о?1+рру 1 


чаг) = 3,1 pp, oF 


Table 3. var (Ê) var (91) 


| | 

p —0-8 —0-6 —0-4 —0-2 0 0-2 0-4 0-6 0-8 

P 
Bi 

0 2-28 1-06 0-69 0-54 0-50 0-54 0-69 1-06 2-28 
0-2 1-52 0-77 0-54 0-46 0-46 0-54 0-75 1-25 2.90 
0-4 0-85 0-47 0-36 0-33 0:36 0-42 0-69 1-267 3-20 
0-6 0-38 0-24 0-20 0-20 0-24 0-32 0-63 1-06 | 305 
0-8 0-11 0-08 0-08 0-09 0-11 0-16 0-29 0-66 | 2-28 
The relative efficiency var VA [var (91) 


is shown in Table 3. This table will also give the relative efficiencies of the $,(j = 1›..‚Ё) 


for a k variate regression when each of the 7; follows a simple Markoff process with the same 
parameter р,. 


The limiting variance of 9, is 0*1 م‎ 


The corresponding estimate of æ is 24(1 —5,.,4)-!, and this has the asymptotic variance 


gc? 1+р 2. 2x 
жау tet Sa 


The variance of the straightforward least-squares estimate of о will tend to 


so that the estimate obtained from 4, is always relatively inefficient and is very inefficient 
for values of р near 1. In such cases a better estimate than that obtained from $0 would be 


k 
&-y— > ;%;, where the means are obtained from all of the observations. 


A common procedure, when positive serial correlation is suspected in the residuals, is to 
estimate the regression coefficients from the first differences of the y, and xy (see Cochrane 
& Orcutt, 1949). Again the efficiency of this method will depend upon the correlogram of 
the ту. Ура: there is only one regressor, which follows a simple Markoff process with 


E. J. HANNAN 141 


variance с? and serial correlation p}, the asymptotic variance of the least-squares estimate 
of f,, from the first differences, can be shown to be (when estimated from 2n observations) 


0% 1 (1—p) PPP =A) 
eidn(l-p)| l-a | 
If р > 0 this will be smaller than the variance of the estimator %1. 

It must be emphasized that the validity of the tests of the regression coefficients, together 
with the estimates of these coefficients (including the estimate of p derived from 5,4), 
depends on the specification of the errors in the regression equation as a simple Markoff 
process being correct. The test of significance of p is, of course, always an exact test. 

Example. Durbin & Watson (1951) use economic data due to A. R. Prest (1949) to demon- 
strate their method. The original data, given in their paper, cover the period 1870-1938, 
and show the logarithm of the consumption of spirits per head in the United Kingdom (yj), 
the logarithm of real income per head (ж) and the logarithm of the relative price of spirits 
(жы). The estimates of the coefficients from the least-squares regression of y on x, and x, are 


2, =—0-120, Й, = –1228. 


Their statistic d = 0-2488 and is far below the lower bound to the significance point for 
d at the 1% level, indicating a highly significant positive serial correlation. 

We will apply the method presented in this paper to this example. 

The coefficients $; are given by 


51 0-307 0-490  —0-651 0-295 0-495]? [ — 0-636 0:795 
De 0-490 1:445 —1-818 0-474 1-423 — 1-818 — 0:693 
و9‎ | = | 09591 — 1:818 2.423 — 0:631 – 1:815 2-401 | = 0:950 
194 0-295 0:474  —0-631 0-288 0-478 — 0-621 —0-792 
95 0-495 1423 —1-815 0-478 1:417 — 1-804 0-630 


90 = 9:218, g*? = 0-000232. 


34 А 
The elements of the 5 х 5 matrix are X (2u—%) (24 —2,). For example, 


0:495 = D}(Zs, o-1 + V2, 141 — Ta) (Ху м — 91,2) 
gj = Pg Els aat aa 

‘ 2,2 = зү ЭХ ш. ` à 

All of the parameters ; are significant at the 1 0% point. The value of F corresponding to. 
$, equals 308 in fact. However, the 5 % confidence intervals for y, and уз do not include the 
straightforward least-squares estimates. They are: 
0-339 <y, «1:201, —0991 «y, « — 0405. 
value of — 0:120 for f, is unacceptable, for it is very unlikely that 


uld fall as income rises. In the original work of A. R. Prest (1949) 


from the first differences of the observations and the 


where 


On prior grounds the 
the demand for spirits wo 
the coefficients were estimated 
resulting regression coefficients were | 

Ê, = 0-736, № =. 0861, 


both of which lie within the 5 0% confidence intervals for yı and Уз. 


142 Exact tests for serial correlation 


The variates x, and x, have high positive serial correlation coefficients of about the same 
value. Together with the high positive serial correlation of the residuals this suggests that 
the extension of Ogawara’s method here presented will give estimators of f, and f, which 
will be at least as efficient as those from the straightforward least-squares procedure. This 
is to some extent borne out by the results. 


I wish to thank Prof. P. A. P. Moran for suggesting this subject and its developments to 
me and for his advice in the research done and in the preparation of this paper. I should 
also like to thank the referee for a number of suggestions and, in particular, for pointing 
out the importance of the first difference transformation. 


REFERENCES 


BARTLETT, M. S. (1946). On the theoretical specification and sampling properties of auto-correlated 
time series. J.R. Statist. Soc. Suppl. 8, 27. 

COOHRANE, D. & Oncurr, б. Н. (1949). Application of least squares regression to relationships con- 
taining auto-correlated error terms. J. Amer. Statist. Ass. 44, 32. 

DURBIN, J. & Watson, С. S. (1950, 1951). Testing for serial correlation in least squares regression. I. 
Biometrika, 37, 409; II. Biometrika, 38, 159. 

Moran, P. A. P. (1950). A test for serial independence of residuals. Biometrika, 37, 178. 

Oaawara, M. (1951). A note on the test of serial correlation coefficients. Ann. Math. Statist. 22, 115. 

Prest, A. R. (1949). Some experiments in demand analysis. Rev. Econ. Statist. 31, 33. 

STUART, A. (1954). Asymptotic relative efficiencies of distribution-free tests of randomness against 
normal alternatives. J. Amer. Statist. Ass. 49, 147. 

Watp, A. (1948). Asymptotic properties of the maximum likelihood estimate of the unknown para- 
meter of a discrete stochastic process. Ann. Math. Statist. 19, 40. 

Worp, Н. (1953). Demand Analysis. New York: John Wiley and Sons. 


[ 143 ] 


ON THE EFFICIENCY OF PROCEDURES FOR SMOOTHING 
PERIODOGRAMS FROM TIME SERIES WITH 
CONTINUOUS SPECTRA 


By М. S. BARTLETT Ax» J. MEDHI 
University of Manchester 


1. INTRODUCTION 
Let X, (r = 1,2, ..., n) be a set of consecutive observations from a real stochastic process 
X, in discrete time. The process is assumed stationary, at least up to the second order, so 
that the stochastic average E(Xj) is a constant (m), which for convenience is put zero, and 
the autocovariance function Е{ xy Aw = ш, = ox р, (1-1) 
is a function only of the interval 8; Then an autocorrelation function p, (8 = 0, £1, £2,...) 
is a valid one for some stationary process X, if and only if 
т 
Ps =f cos (sà) dF (À), (1-2) 
° 
where F(A), the (integrated) spectrum of the process, has the form of a distribution function 
defined in (0,7) (Wold, 1938). Ignoring the possibility of a third singular component, we 
have further the relation FQ) = EQ) 4E). (1-3) 
where F,(A) is а step function, the (integrated) discrete spectrum, and F,(A) has a non-zero 
derivative f(A), the continuous spectrum. In particular when су is zero (as will now be 


assumed) т 
Ps -[ cos (sÀ) f(A) dA, (1:4) 
which has the inversion formula 
уду) =} E pacos (A). (15) 
We define g(A) = 200% f(A), 


so that g(A) =2 X 400008 (8A). (1:6) 
s=- \ 

The problem of estimating directly the spectral density function f(A) (or g(A)); previously 

considered by Daniell (see the discussion to the paper by Bartlett, 1946), Bartlett (1950) 


and Grenander (1951), will be discussed further in this paper. Grenander & Rosenblatt 
(1952, 1954) have recently investigated also the problem of constructing an entire con- 


fidence band for the integrated density function g(A); for further suggestions on this 
red here, see Bartlett (1954). 


problem, which will not be conside 


9. ESTIMATES OF THE SPECTRAL DENSITY FUNCTION 


which will be defined as 


1 n—8 


n х Х.Х (| s| «n), (21) 


The sample autocovariance, 
0, = 
0 (|s] 2.9) 


144 Procedures for smoothing periodograms 


is, under wide conditions, a consistent estimate of w,. If now we substitute C, for w, in 

(1-6), we obtain T 2^ 

2 à C, cos (sÀ) = n X рео») АД] 
pum 


8-—n 
= ЦА), (2:2) 
where I (A) is the periodogram intensity G(A) G*(A) = A*(A) + B*(A), G(A) being calculated as 
2 n 
— X er 23 
PEL (23) 


(usually for integral p, where A = 2лр[л). However, in spite of the relation (2-2), it is now 
known that Ј(А) does not provide a consistent estimate of g(A), owing to its large sampling 
fluctuations which do not diminish as n increases. The relevant stochastic properties of 7(A) 
have been discussed by Slutsky (1927), Bartlett (1950, 1954), Grenander (1951) and others, 
and will merely be quoted when necessary. The further condition imposed on X, for these 
properties to hold is that, ifnot normal, at least it has the linear form 


X= X b Y us (2:4) 
м=0 


where Y, is a sequence of independent quantities with zero mean and constant variance c?. 
In view of the sampling properties of Ј(А), ‘smoothed’ estimates of g(A) have been 
proposed. Thus Daniell put forward the estimate 


1 f^*^ 
аыл) = gy |, Тео; (25) 
A-h 
and Bartlett’s formula (cf. also Tukey and Hamming, 1949) for smoothing the periodogram 
I(A) is n=l [s| 
ga) =2 x (1-15) osea (26) 
s=—n'+1 


[| 


where €, = С, / [ لا‎ The resolving power of this last procedure increases with n’, 


and the smoothing with m = n/n’ (where n is the total length of the series), fluctuations 
being of order 1/,/m. Grenander subsequently examined a whole class of estimates containing 
the above two, by introducing a general weighting factor, u,(A), in the formula, 


г geld) = 2 X (0)0, c08 (9A), (2) 
or in alternative form (for suitable conditions on u,(A)) 
240) =| Toto) de, (28) 
where the positive weighting function w,(«) is expressible in terms of u,(A) by the formula 
wu) = р> (À) оов (8л) cos (so). (2-9) 
The estimate gg(A) has the asymptotic sampling properties 


Е(00(А)) ~ ['ato w,(w) do, (2:10) 


var (go 00) 77 Гуо) ito) do. (21) 


M. S. BARTLETT AND J. MEDHI 145 


The relation of Daniell's estimate gp(A) in (2-5) to the general formula (2-8) is obvious; the 
other estimate in (2-6) has for large n the effective weighting factor 


lel , 
uA) = l n d (2-12) 
0 


(|s|>n’), 


and the weighting function corresponding to u,(A) in (2-12) is 


_ _1 [sin®{§n'(o—A)} , sin®{Jn'(w+d)} 
wal) d sin? {}(o—A)} ^ аша отл) | wae? 
Clearly Figs) эу], godo 
~g(A) for small h only, (2-14) 
and var {gp(A)} ~ ZPA for small h. (2-15) 
Grenander has given also the results 
Е{дв(А)} 9) (2-16) 
er | == рд) (0), 
var fgs} m1 , (2-17) 
(3.90) (A=0). 


3. UNCERTAINTY RELATIONS 
When Grenander investigated further the sampling properties of gg(A), he imposed the 


condition » 
| "ообо = 1 (3-1) 


as а ‘sort of asymptotic unbiasedness’, but this condition (which із satisfied by the 
weighting function for gp(A), even for h not small) appears unsatisfactory, and we shall use 
the correct condition for asymptotic unbiasedness from formula (2° 10), viz. 

ferta 90, (82) 
With this condition the formula for var (g5(A)) in (2-11) yields 


п т 2 
x var {gq(A)}~ fino w(o)do > 1 IT w,() 2 


1 و‎ 
gei (A) , 
2 
whence asymptotically var {g,,(A)} > 429. (3-3) 
Moreover, the condition for equality is reached if 
wlw) = 9(A)/[79()]- (3-4) 
Biom. 42 


146 Procedures for smoothing periodograms 


This ‘ideal’ estimate thus has weighting function inversely proportional to g(w),} not g*(«), 
as Grenander obtained with condition (3-1). However, it is still entirely theoretical in 
character, for it assumes a complete knowledge of the unknown spectral function g(w). 
Furthermore, for different values of A, the weighting function multiplying the stochastic 
quantity I(w) remains inversely proportional to g(#), and all the corresponding estimates 
0с(А) are stochastically equivalent. 

In an attempt to overcome such difficultues, Grenander made use of a measure of 
‘resolvability’, defined in general by the ‘standard deviation’ of w for the weighting 
function w,(w). As w,(w) no longer necessarily satisfies the ‘normalization’ condition (3-1), 
this measure is modified here to 


~ u-[ | o-4 | Гоо, (85) 


(note also that the deviation w— А is used, whether or not the ‘mean’ w is exactly A). It is 
further generalized to 
т т lr 
UY = INE ОЛ ОЛ (r» 0), (3-6) 
° 

mainly in order to stress the rather arbitrary character of such measures. In terms of 
UY) and z М 
Um [toy vito) do 5 var (0000), (91) 


we may now generalize Grenander's ‘uncertainty principle’, connecting Uf? and U, by the 
following modification of his argument. 
From the Tchebychev inequality 
Q{| o—A| <2UP}> (1 E 7) X 
where Q denotes ‘probability’ in terms of the formal frequency function 
по) оо) | feodo, 


f a мех todo 1 3+80 T 2 
w)dw > ELT 
i wt! 240 p [/ „юр? | 
= 1 1 1]? 
wel » »| 
Let A, B be the upper and lower bounds of g(w) in (0,7), these being assumed (respectively) 


finite and non-zero. Then 2 
U,> B? f wi(o) do 
0 


BT 1p 
ат! s] 
where k, = [ow (о) do. (3-8) 


T The ideal weighting of I(w) by a function varying as 1/g(«) is in entire accord with the asymptoti 
٤ : т tic 
e EENS to a uniform spectrum often desirable in spectral analysis (sce, for example, Bartlett, 


M. S. BARTLETT AND J. Мерні 147 
Now Aly> |е) lw) do~ 9, 
whence finally Us UQU,» 40a) |! 3 z|) >0, (3-9) 


which shows that the product U of the two quantities UT? and U, has a positive minimum. 

The condition of asymptotic unbiasedness which we have imposed could perhaps be 
waived by considering the mean-square deviation of gg(A) from g(A); this would, however, 
no longer give a criterion like U, asymptotically independent of n, so that it seems more 
convenient to adopt the conditions above. 


4. ASYMPTOTIC EFFICIENCIES OF gp(A) AND gp(A) 


From (3-9) we may define a quantity 
W=U/g*(A) (4:1) 

as a measure of uncertainty depending on the form of the estimate (А), and on the choice 
of їп UY’. We compare gp(A) and g5(A) by this criterion. The ‘ideal’ estimate (for r = 2) 
with the weighting function given in (3-4) is modified to 

wy(e) = g(A)/[2hg()] (42) 
if restricted to the interval (A—h,A+h). For small h,g(w) ^ g(A), so thatg,(A) approximates 
to the ‘ideal’ estimate if the latter is so restricted. As, for gp(A) with small h, 


UP ~h(r+ 1), U~ ¥gXA)[h, (4-3) 
we have W r4 1). (4:4) 
Table 1 


A numerical caleulation (Table 1) was made of W for the restricted 'ideal' estimate 
(r = 2) for varying h in the particular case of the spectrum of M. G. Kendall's artificial 
series I (Kendall, 1946). As, for small h, W — 0-289 (for all A within the total range), the above 
results show that W is smallest, in this particular case at least, as Л — 0, for the above class 
of estimate. More generally, for small Л, it is readily shown that 


1 2ch А 
V = zs , (4 5 
1 dg(A)\? 1 @g(A) i 
== «(a dx) зо ac id 


во that whether W increases or decreases as A increases from zero depends on the sign of с; 


it will, however, increase in the neighbourhood of a maximum of g(A). 
10-2 


148 Procedures for smoothing periodograms 
For дБ(А) it can be shown (e.g. by the use of (2-13)) that 
4[ín— a е 1 
(USP ~ ا‎ log cos $À + = log sin ^+ 1003] | 
2 . 1 | (47) 
Up~ 5,18" ; | 
[UPI Ц) cos (4m) T(2-7)] (0«r«1). 
Making use of the result (2:17) (for A + 0), we arrive at the comparison given in Table 2 of 
gp(À) and 7Б(А) in terms of W. Asymptotically, g5(A) is thus superior to g,,(A) if the criterion 


W is taken for r = 2 (or 1), but the generalization of the resolvability measure U, emphasized 
its arbitrary character, and the ‘efficiencies’ of the two estimates cross over as r decreases. 


Table 2 
r 9р(А) 9в\А) 
2 0-289 0-125 Jn’ (A=4n) 
1 0-250 0-0675 log n’ 
i 0-222 0-270 
i 0-211 0-222 
0-1 0-193 0-177 
0+ 0-184 0-162 | 


5. INVESTIGATION OF THE THEORETICAL OPTIMUM WEIGHTING FUNCTION 


The replacement of U, by UPU, as a criterion of efficiency naturally complicates the 
investigation of the ‘ideal’ weighting function, but while this will still of course depend on 
a knowledge of the theoretical spectrum, it seemed worth while investigating the form of 
the optimum function if this criterion were adopted. We examine the case r = 2. Let 
w,(o) = v*(w), where v(w) is real, be the optimum function, and consider the slightly modified 
function v(o) +2 7 (0), where 7 is small and (о) arbitrary. Denote the change in U, U, by 


ô(U, U). Now from (3-8) U, = [(5, — 2А, + A3) ] ke], (5-1) 
and, to the first order of 7, (kn) = WK p, 

where K,z КТ, (о) do. (5:2) 
Hence (U,) = (К, 2AK, + AK) (U; ko) — KoU; lko}. 

Also 6(U,) = [уо rw) V(o)do = 41, say. 

Thus finally AUU) = z=- AREAK) Ua - Ko UU, SUL]. (5-3) 


Ав v(w) is assumed to be the optimum function, the coefficient of 7 should vanish. This 
implies that the expression within the square brackets in (5-3) should vanish for any appro- 
priate V(w). The condition of asymptotic unbiasedness gives the condition 


IET V(w)dw = 0, (5:4) 


М. S. BARTLETT AND J. MEDHI 149 
which will be satisfied if we take in particular 
€(o—a) _ e(w— f) 
g(a)v(a) g(P)v(B))’ 
where e(x)= h weer 
1 (x20) 
The expression in square brackets in (5:3) now becomes 


V(w)dw=d 


Uj(a-A* ыт poll 1 hang 
U, | g(a) 9 UU, | a) gA) + 4ko U, lg(a) v*(x) — g (8) v*(A)), 


and, as this should be zero for arbitrary <, 2, we obtain 
U,(a—A)*?_U,U, 
U, да) gia) 
A B (x—A)* 
or * -A |! |: 55 
Ao = gaj gel". OF d 
which determines the optimum form (if this exists in the above sense). Tt depends of course, 
as anticipated, on the unknown theoretical spectral function g(w). In (5:5), 

B = }0U,/ky, (5-6) 
and the two formulae for U, and U, give, with the unbiasedness condition, three further 
relations for A, В, U, and Uj. If these yield an appropriate solution, the minimum UU, 
is then also calculable. 


+ Ak U, g(x) (ж) = constant, 


6. EQUATIONS FOR THE OPTIMUM FUNCTION IN THE CASE OF THE 
SECOND-ORDER AUTOREGRESSIVE PROCESS 


The determination of the coefficients in (5-5) is unfortunately extremely laborious, and 
sometimes impossible, as was experienced in the particular case of the second-order 


M RESET Xy ac aX, 4 + OX, = Yu. (6-1) 
For (6-1) we have g(A) = 270% f (A) = 9| (a: + B cos À +y cos 2A), (6:2) 
where a = 1 +a 4-02, В = 2a(1 +b), у = 2b, à = 20° (see Bartlett, 1950). Define 


a= ergo, U = iocari 
0 


which are evaluable for the g(A) of (6-2). Then it will be found that (5:6) yields the equation 


42 + 24( B — 9C) +a В? — WPBO + bf? O° = ABh,, (6-3) 

where С = B/U}, and Aa® + Bag — СЫ? = ko. (6:4) 

The relation for U, give AU + BLY — Cb = ky B/C, (6:5) 
and the unbiasedness condition gives 

Ат + Ba — Cb? = g(A). (6-6) 


Substitution of kọ in (6:3) and (6:5) gives, with the aid of (6-6), two equations of second 
degree in A and B. A being found in terms of B from one of them, a biquadratic equation in 


150 Procedures for smoothing periodograms 


A results from the other. An examination of an actual numerical case of (6-2) revealed that 
no.solution in the admissible ranges for the unknowns is always possible; in any case, if 
we recall that the solutions even when they exist are only available when g(A) is known, these 
equations seem unlikely to be used in practice. 


a 
1. THEORETICAL GAIN IN EFFICIENCY WITH WEIGHTING FUNCTION IN RESTRICTED RANGE 


In view of the difficulty of handling the optimum weighting function (on the basis of the 
criterion U), it may be advisable to demonstrate that the optimum function on the basis 
of the criterion U, but restricted to the small interval A—h, A+h, viz. 


w(A) = g(A)/[2hg(w)] ~ 1/2h, (7-1) 


may at least be improved on the basis of the criterion U. As, for small h, g(w) ~g(A) to the 
first order, the weighting function in (5-5) may be written to the same degree of approxi- 


mation A'+B'(w—A), 
and it is easily found from the further relations for A, B, U, and U, that 


3 = А)? 
оо) ghi- a) (7-2) 
Moreover, for (7-2), W =3,/5/25 = 0-268. 


This is only a small improvement over the value 0-289 for W with the Daniell estimate 
gp(4), but it will be noticed that this approximate weighting function is independent of the 
unknown spectral function g(A), and hence also usable in practice. As the approximate bias 
in the resulting estimate gy(A) is reduced from $h2d°g(A)/dA® to zoh?d?g(A)/da?, the use of 
gw (A) in place of gp(A) might sometimes be preferable when the periodogram J(w) is 
available. 

The merits of the other estimate considered, g,,(A), rest in its asymptotic unbiasedness 
and its convenience of calculation from the first »' — 1 sample autocovariances (or auto- 
correlations). From Table 2 we saw that its efficiency depended on what measure of 
resolvability was used. 


REFERENCES 


BamrrETT, M. S. (1946). On the theoretical specification and sampling properties of autocorrelated 
time series. J. Roy. Statist. Soc. Suppl. 8, 27. 

Влвтіютт, M. S. (1950). Periodogram analysis and continuous spectra. Biometrika, 37, 1. 

BARILETT, M. S. (1954). Problémes de l'analyse spectrale des series temporelles stationnaires. 
Publ. de Inst. Statist. (Univ. de Paris) vol. n, fasc. 3, 119. 

GRENANDER, U. (1951). On empirical spectral analysis of stochastic processes. Ark. Mat. 1, 503. 

GRENANDER, U. & ROSENBLATT, M. (1952). On spectral analysis of stationary time-series. Proc. Nat. 
Acad. Sci., Wash., 38, 519. 

GnENANDER, U. & ROSENBLATT, M. (1954). Statistical spectral analysis of time-series arising from 
stationary stochastic processes. Ann. Math. Statist. 24, 537. 

KENDALL, М. G. (1946). Oscillatory Time Series. Cambridge University Press. 

Stursky, E. (1927). The summation of random causes as the source of cyclic processes. Problems of 
economic conditions, ed. by the Conjuncture Institute, Moscow, 3, no. 1. (Later reprinted in 
Econometrica, 5 (1937), 105.) 


Токкү, J. W. & Hamme, R. W. (1949). М i oise color, I. Bell А . Memo. 
PS easuring пи color, Tel. Lab 


Worn, Н. (1938). A Study in the Analysis of Stationary Time Series, 1st ed. Uppsala. 


[ 151 ] 


THE AUTOCORRELATION FUNCTION AND THE 
SPECTRAL DENSITY FUNCTION 


. 


By J. WISE , 
London School of Economics 


INTRODUCTION AND SUMMARY 


The relationship existing between the autocorrelation function and the spectral density 
function of a stationary, purely non-deterministic stochastic process is well known. Discus- 
sions of the relationship have been given by Wiener (1930) and Khintchine (1934), in the 
case of the continuous process, and by Wold (1954, Chapter 2, §17, pp. 66-75), and Doob 
(1953, Chapter 10, $83 and 4, pp. 473-86) for the discrete process. The treatment in the 
present paper is confined to the discrete process.* e 

The relationship referred to also provides a fundamental connexion between the auto- 
correlation matrix of a process and the corresponding spectral density function of the 
process. Some rather ingenious manipulations on the matrix representation of the spectral 
function have been carried out by Whittle (1951, Chapter 4, § 2, pp. 35-6, equations (4272) 
and (4:276)). However, Whittle’s work contains some inaccuracies. More important still, 
‘exact’ results exist for those cases in which Whittle gave ‘asymptotic’ solutions. 

In the course of the derivations of these exact relationships, the author uses methods 
which appear to be new—though bearing some relationship to André’s method of solving 
linear difference equations (see С. Jordan, 1947, pp. 587-99)—and are capable of wide 
application in the theory of discrete linear stochastic processes. The use of exact relation- 
ships does, moreover, lead to a considerable improvement in clarity and rigour. 

In view of the fundamental exact relationships existing between the autocovariance 
matrix and the spectral density function, the former might appropriately—in the case of 


the stationary process—be termed the spectral density matrix of the process. Traditional 


terminology has, however, been adhered to in the remainder of the paper. 
The main topics treated below may be classified under the following headings. 


(a) The circular process 
(i) The exact relationship e 


density function. 
(ii) The latent roots of the autocovariance matrix. 
(iii) The exact relationship between the canonical form of the autocovariance matrix 


and the spectral density function. 


xisting between the autocovariance matrix and the spectral 


(b) The non-circular process 
(i) The exact relationship corresponding to (a) (i) above. | 
(ii) The exact inversion of non-circular autocovariance matrices. 


* This discrete process may, however, be a sample realization drawn from & continuous process. 


152 Autocorrelation function and the spectral density function 


(a) THE CIRCULAR PROCESS 
Definitions 
The vector of N random variables, given by x’ = (zy, zy. , ...,2;], is assumed to have the 
following distributional properties: 


Ex =0, (1) 
Ед =о% {¢=1,2,...,N}, (2) 
Ёх,х,уу, = Op, {в= 1,2,...‚М}, (8) 
where PN«L = PN-L = Pr. 


The sequence of values p, ps, ..., py is termed the autocorrelation function of the process. 


Ехх' = V (the autocovariance matrix) 


1 р Pa Ps - Po Pi 
Pi l р Pa ... Pa Pa 
=0%|р, р 1 p ... Pa рз |, (4) 
Pi Ps Ps Ра = fy 1 
an N x N matrix. 
Properties 4 


The autocovariance matrix, V, may, from (4), be expressed іп the following form: 
V = et {1+ p,(W + W) + p(W? + W*) +... + pyr- (WEND 4 WND), — (5) 
where N is a positive odd integer, or, alternatively, 
= 07 {1 +p,(W + W7) + p(W? + W-?) +... + 3o (WN +W), 


where N is a positive even integer. For all values of N, I denotes the N x N identity matrix, 
and W denotes the N x N circulant definition of the auxiliary identity matrix, that is to say, 


ONO 0 x0 
Me Weal Mick 

И оо 0d s 0]. (6) 
оне. 
|f rod 0 


Since the matrices (WZ + W~Z) for (L = 1,2,..., N) are commutative, they may all be 
simultaneously reduced to canonical form by a PEN transformation. 
Thus, if we consider the matrix L, satisfying the property 


L'L =I, (7) 
L'(W^-- МУ), = diag {of + 07%, ..., 0k + wy), (8) 
where 0, €, ..., Wy are the N values of the roots of the equation 
yN-1 - 0, (9) 
then L'VL = diag (v, v, Va, ..., vy) 


provides the canonical form of the autocovariance matrix У, 


J. WISE 153 
Furthermore, we have 
2nsL 


wy +o,” = 208-7 (8 = 1,2,...,N; L = 1,8, ..., N). (10) 


The latent roots of V are thus given by 


v, = {1 +p (0, + 0; !) + роо?) +... +Pyy_p(Ol* 7D +07 &N79)) (в = 1,2,..., N) 
(N—1)78 


278 
= Е 


N 


47s 
+ 2p C08 Ty +... + 2pyy—p 008 


= ele: сов 
Py N 


| (в = 1,2,..., N) 


for N odd, or by 


y, = 0° (14- p(w, + 7") + раи + ug °) +... py (ol o; 9N)) (8 = 1,2,...‚%) 


t 278 4s 
= e 439, 08 279 + 2p,cos хе +... pas соле (8 = 1,2,..., N) (11) 
for N even. 


The relationships given by (5) and (11) are the correct restatement of similar forms given 
by Whittle (1951, Chapter 4, $2, pp. 35-6), which are incorrect even as approximations in 
the non-circular case. 

General rules for the multiplicities of the latent roots of V have been incorrectly stated 
by both Whittle (1951, Chapter 4, $3, p. 37) and Watson (1951*, Chapter 2, $5, p. 50). 

Results correctly obtained by В. L. Anderson (1942, § 5a, pp. 7-8), on the multiplicities 
in the values of the terms in the sequence, 


2r L 4nL 6л1, 
сов N^ cos N^ cos Ncc cos 27 L, (12) 
may, however, be used to derive valid statements relating to the multiplicities in the latent 


roots of V. 

Let N = nk and L = Ik, where n, Гапа k are positive integers, n and l being prime to one 
another. Two cases may then be distinguished. 

(i) For n odd, there are }(n—1) terms in the series (12) occurring with 2k-fold multi- 
plicities, together with the value unity occurring with k-fold multiplicity. 

(ii) For n even, there are 3(n — 2) terms in the series (12) occurring with 2k-fold multi- 
plicities, together with each of the values + 1 occurring with k-fold multiplicities. For the 
derivation of these results the reader is referred to R. L. Anderson’s paper. 

Although it is impossible to enumerate here all the possibilities in detail, a ‘rule’ which 
the author has found useful in applications of the theory to sampling problems will be given 
relating to the multiplicities in the latent roots of V. This ‘rule’ is adequate for most of the 
cases which occur in practice. The ‘rule’ is not, however, a theorem, since exceptions to it 
occur when certain relationships exist between the elements of V. Care must therefore be 
taken when it is applied. This ‘rule’ is as follows: 

If р, +0 for at least one value of L satisfying the condition N = nk and L = lk, n and I 
being prime to one another, then the multiplicities in the latent roots will be at most 2k-fold. 

Whittle's assertion that if N is odd there are 4(N — 1) pairs of distinct roots, with a single 
root greater than or less than all of these others; and if N is even there are (N — 2) pairs of 


* Watson states here that 'real symmetric circulant matrices have at most one odd latent root, the 


rest being pairwise equal’. That is, of course, incorrect. 


154 Autocorrelation function and the spectral density function 
distinct roots with one root greater than and one root less than all of the other roots, is 


incorrect. 

An example occurring in the literature on time-series analysis, demonstrating a practical 
instance in which the ‘rule’ stated above yields the correct result, whereas Whittle’s 
assertion breaks down, is the lag — Г, definition of the Markoff process, given by 


a-pa = 6 (¢=1,2,...,N), (13) 
where тув = 25, N and L not being prime to one another. 
The spectral function of a circular process 


The spectral density function of the circular process possessing the distributional pro- 
perties (1), (2) and (3) is of the form* 


—1)0 2т 4 
V(O) = o? [1 + 2p, овб+русов 39+... Зри сов" 5 ) | b- ed ^m, 


where N is odd; or of the form 
v(8) = a? (1-- 2p, cos 0 + 2p, cos 20 + ... + ру cos +00} lo = T t NP a), (14) 


where N is even. 

The latent roots of V, given by (11), are thus the values of the spectral density function, 
v(0), taken at constant intervals of 27/N in the values of б. 

These roots may thus be regarded as being generated by the spectral density function of 
the process. 

The results obtained so far are, of course, exact for finite N. By considering the limiting 
values of the roots as N tends to infinity, asymptotic results may be obtained. 


The autoregressive-moving average process 
The discrete circular linear process of the autoregressive-moving average type may be 
written in the form 


3,0542, 4 oe Бора = EAP Gat. + Pye, {t= 1,2,...,N}, (15) 
where хүү = %g, vig = Eg, and 


Be,=0 {#=1,2,...,N}, 
Heg =o s21,92,...,.NY, (16) 
He,g,=0 {e+}. 


* Any discrete stationary purely non-deterministic process may be written in the form 


ж =з + Birt Paes + ss 
where the summation extends to infinity. 
The spectral density function of this process is defined as 
v(0) = (1+ Bye + paet +...) (14+ Bye“ + 0,е—20 + ...) var 7 


for both circular and non-circular definitions of the process, for samples consisting of any number of 
consecutive observations. The relation (14) is obtained by expanding »(0), thus defined, ав a Fourier 
series. 


J. WISE 155 
In the matrix notation the process becomes 


(I+a,W+...+a,W*) x = (1+ AW +... + AW) €, (17) 
where X = (zy, Z5 py Te MH} 
and є' = (e, 6v p EN- +++ as Er} 


Solving (17) for x, we obtain 
x = (144W +... нор) (1+ PW +... + W^) e. (18) 
Thus, from (18), we obtain Ex =0, (19) 
and 
Ехх' = V (the autocovariance matrix) 
= оно +... Бо) (I + RW +... + B,W*) (I WO +... + PW) 
x ([+aW t+... +в W>). (20) 
The latent roots of V are therefore given by 
(1410,4... + 8,02) (1+. 95 3 4- ... + fuo") 
v= с? 8 8 1°7в As 5 
ы (Гао, +... + t UE) (1+ 9%; 1+... + 2; *) peu 


s Вета +... + fu iN) (1 +В, ємї s+ Bete) 19 N 
(Га, er. a, er) (1 Tra, e SEN p... ае FIN) (в = 1,2,..-,N). 


(21) 
The spectral density function of the circulant process (15) may be written as 
(1+ fie? +... + Вһећ) (1+ he" + cfe ) 
= 6% 1 h 
MeL m (1+ a, e? +... +06) (1 Fae +... rage 1) (0030) 
= g(e) (let us say) (0«0« 27). (22) 
Tt follows from (20) and (22) that V = g(W), (23) 
and, in addition, y, = g(v,) = g(cr*N) (в=1,2,..., N). (24) 


These results are exact, and are fundamental to the exact treatment of circular processes. 
An asymptotic relationship between V (in the non-circular case, to be further discussed 
in $ (b) of this paper) and g(W) has been given by Whittle (1951). Itis not at all clear in what 
sense this asymptotic relationship is valid, unless the exact form is used. Whittle (1952, 
Theorem 1, (6) and (7), рр. 49-50), and Wold (1953, Chapter 11, §2, Theorem 1) use the 
following interpretation of the asymptotic relationship: 
x’ Vx 

lim -arz =! 25 

xs X'(W)X en 
for almost all realizations of the process. This result, though valid, is difficult to work with 
f his derivations, and for а rigorous treat- 


rigorously. Whittle does not adhere to it in any о 
ment it is preferable to work throughout with exact relationships. The latter are no more 
cumbersome than the asymptotic forms, and their use may lead to the avoidance of 


substantial errors. 


156 Autocorrelation function and the spectral density function 


This is abundantly illustrated in the case of the first-order autoregressive process in the 
circular case. The largest latent root of the autocorrelation matrix for ж, < 0 is given by 
mi re The corresponding asymptotic form of this is lah which differs sub- 
stantially from the exact value when N is small and — о, is close to unity. In fact, as – æ, 
tends to unity, the exact form tends in value to N, while the asymptotic form tends to 
infinity. 

A final point which should be made on the subject of the latent roots of circulant variance 
matrices is that in the derivation of the values of, and of the multiplicities in these roots, 
itis usually more convenient to consider the spectral density function in the form (22) than 
in the form (14). 


(b) THE NON-CIRCULAR PROCESS 


Whittle (1951, Chapter 4, $2, relations (4:253), (4:262) and (4-28), pp. 34-6) used the 
‘asymptotic’ form of (23) to obtain a simple method for the ‘approximate’ inversion of V 
in the non-cireular case. For many purposes, approximate inversion is not adequate, and 
is not in any case necessary. It will be shown below that a very simple method of exact 
inversion is available. Whittle (1951, Chapter 4, $2, relations (4:242)-(4-252), pp. 33-4) 
does give a cumbersome method of exact inversion based on the auxiliary identity matrix, 
denoted by U, where 


Olea s 25620 
Bof Poo eec 
Отоо о "0 
U=|: we. (26) 
Ох 0,050 1 
O0: 01.020 шу :0. 


an N x N matrix. Unfortunately Whittle's method appears to give the required result only 
in the case of the first-order autoregressive process. 

An exact method of inversion will be given below which utilizes a non-circular counter- 
part of (20), the relation between the autocovariance matrix and the spectral density 
function. 

The non-circular discrete linear process, generated by the relationship 


Uy ty yt. +My = + preat -+ һе, (27) 


will be assumed to be semi-infinite. This does not prevent us from deriving results from 
samples consisting of N successive terms of the process. Such results for finite samples 
follow immediately from the case of the semi-infinite process. This is due to the fact that the 
autocovariance function and the spectral density function are both completely independent 
of N, a property which does not hold in the circular case. 

We will assume (27) to represent a purely non-deterministic stationary process. An 
important theorem obtained by Wold (1954, Chapter 2, § 20, pp. 84-9, especially Theorem 7, 
р. 89) states, furthermore, that every semi-infinite, stationary, purely non-deterministic 
process can be expressed in the form (27). 


А condition which is both necessary and sufficient for the stationarity of the process (27) 
can be obtained from a result given by Doob (1953). 


J. WISE 157 
The spectral density function of the process (27) is given by 


14- f, 68 +... + Ba 609) (1+ 8e7 ® + ACT 
ei =o! 1 ье 1 oo the) 
g(e") (Fae 4... tape) (1-- ae + ... +a,e™) (0<0 < 2m). 


It may, in passing, be noted that this is the same as (22), given for the circular process. 
According to Doob (1953, Chapter 10, pp. 452-506, especially $10, pp. 501-6), the 
necessary and sufficient condition for a stochastic process with the spectral function (22) 
to be stationary in the ‘wide sense’ (Chapter 2, 88, pp. 94-5) is that the equation 
PET A 1+ ...+ Ê» = 0 
shall have no roots outside the unit circle, and that the roots of the equation 
26 + ر‎ zh + ... to, = 0 
shall all lie inside the unit circle. The reader is referred to Doob’s* penetrating exposition 
for a more detailed treatment of this topic. 
The relationship (27) may be transcribed as 
(I+aU+...+a,U*)x = (1+, U+... +n U^) e, (28) 
where x and є are semi-infinite vectors, and U is the semi-infinite auxiliary identity matrix. 
Solving (28) for x, leads to 
x = (I+a,U+... + a, U5)71 (I+ a, U * 4- ff, U^) є. . (29) 
From (29) we deduce Ex =0, (30) 
and V = Ёхх' 
= g*(I--o, U-... + ay U*)3 (I + PU +... +f, U^) (I+ 8, U’ +... + 8,U*) 
x (E-- a4 U' +... +a, U^), (31) 
which is thus the autocovariance matrix of the process (27). This is the exact, non-circular 


counterpart to the relationships (20) and (23), given above for the case of the circular process. 
Relationship (31) bears a close affinity to the ‘autocovariance generating function’ given 
by 
(1+,2+... yz) (o Piz +... +827") 
= 0? : 32 
ро (raga... Hag) (LFT -Hag (7) 


However, (31) is much more useful. From it can be obtained directly (i) the autocovariance 
function of the process (27), this being the property (31) shares with (32); (ii) the inverse 
of V in its exact form; (iii) the integral and rational powers of V and V7. 

Thus we have, from the elementary rule on the inversion of a product of several matrices, 


oN- = (I o4 U' +... a, U'£) (I+ /,U' +... TURO E 
x (I- ]/,U +. + f, U^)7t (I+a,U+..- +a UF), (33) 
which is no more difficult to evaluate exactly than the matrix product in (31), which is the 


autocovariance matrix itself. 
It is apparent from (31) and (33) that although V is a Laurent matrix, V-! is not, à 


property already noted by Whittle. 
* See also Wise (1955, Chapter III) for a further analysis of this problem. 


158 Autocorrelation function and the spectral density function 


The autoregressive process 
To illustrate the application of the above results to problems of importance which have not 
been solved in the literature even in special cases, the inverse of the autocovariance matrix 
of the autoregressive process generated by the relation 


UT Oy By to Барр = 6 (34) 
will now be derived for the semi-infinite process. 
From this result the inverses of the autocovariance matrices of N successive observations 
from the process (34) are then given in full for k = 1, k = 2 and k = 3. 
From (33) we deduce for the special case (34) the result 
g*V3 = (1+010' +... +00") (1+0 U+... +a, UY). (35) 
Taking Ё = 1, 2 and 3 respectively, we deduce at once from (35) the matrices 
Gi) k=l: 


1 0 0 t 0 0 
pelos ea, E ап) 
l+a ... 
MUP Re eae EU КӨ (36) 
0 0 QU UE oy ДҮ 
ОЛЛО Ah | tse УН 
an N x N matrix. 
(ii) & = 2: 
1 04 а, 0 0 0 
a 1+0 044 +9193 Oy dessous t 0 
® Ay +h 1+02+08 о +оо 0 0 
GON и КИ." a +a“, 1+@%+0% Q0 (37) 
0 0 0 0 1+ a 
0 0 0 0 a, 1 
an N x N matrix. 
(ili) k 3: 
1 A [A ts 0 0 
a 1+0 Oy + Hy Oy Oe + 04 Oy 0 0 
sa |a atan 1+00+08 a+ ata tae, 0 0 
PIV = ag atat, оа + ata +азаз 1+ a+ af +o 0. о |. (38) 
0 0 0 0 ws жай ay 
0 0 0 0 т к eel | 


The result (36) confirms that obtained for k = 1 by Cochrane & Orcutt (1949, equation (6-14), 
p. 57). However, the results (37) and (38) for k = 2 and k = 3 respectively do not appear 
to have been given previously in the literature.* So far as the author is aware, (35) is new, 
and the inverses of the autocovariance matrices for k — 4, 5,6,..., etc., can be written down 
immediately from it, rendering ‘approximate’ methods of inversion unnecessary. 


* Note, however, Champernowne (1948, p. 206 equation 3- bh ot , EIE 
for the case of the autoregressive process, Р : ion 3-5), which gives the inverse implicitly 


" J. WISE 159 


REFERENCES 


Axpznsox, R. L. (1942). Distribution of the serial correlation coefficient. Ann. Math. Statist. 13, 1. 
o EP D. G. (1948). Sampling theory applied to autoregressive schemes. J. В. Statist. Soc. 
COOHRANE, D. & ORCUTT, G. H. (1949). Application of least squares regression to relationships con- 
taining autocorrelated error terms. Ј. А 

роов, J. L. (1953). Stochastic Processes. New York: John Wiley and Sons. 

JorDAN, C. (1947). Calculus of Finite Differences. New York: Chelsea Publishing Co. 

KHINTOHINE, А. (1934). Korrelationstheorie der stationaren stochastischen Prozesse. Math. Ann. 


109, 604. 

Watson, G. S. (1951). Serial correlation in regression analysis. Unpublished Ph.D. thesis and Inst. 
of Stat. Mimeo. Series, North Carolina State College. 

WurrrrE, Р. (1951). Hypothesis Testing in Time-Series Analysis. Uppsala: Almquist and Wiksell. 
Waurrrie, P. (1952). Some results in time-series analysis. Skand. ActuarTidskr. 35, 200. 

Wiener, N. (1930). Generalized harmonic analysis. Acta-Math. 55, 117. 

Wise, J. (1955). Ph.D. Thesis (unpublished), University of London. 

Worn, Н. (1953). Demand Analysis. New York: John Wiley and Sons. 

Worp, Н. (1954). А Study in the Analysis of Stationary Time Series. Uppsala: Almquist and Wiksell. 


[ 160 ] 


SAMPLING PROPERTIES OF LOCAL STATISTICS IN STATIONARY 
STOCHASTIC SERIES 


By G. Н. JOWETT 
University of Sheffield 


1. INTRODUCTION 


The sampling moments of large-sample statistics calculated from stationary time series 
usually require for their evaluation the summation of long series of cumulants or products 
of cumulants of the series, or for normal series the summation of long series of autocorrela- 
tions or products of autocorrelations (cf. Bartlett, 1946), These series usually converge only 
as the cumulants in question tend to zero, which is often in practice so slow a process as to 
make their evaluation a matter of considerable difficulty. There is, however, a class of 
statistics which may be called local in that they are dependent on short-term comparisons 
of terms in the series, and for these the summations are of such a kind as to converge as 
certain finite differences of the cumulants tend to zero; in practice this often happens much 
more quickly than the tending to zero of the cumulants themselves, and the corresponding 
formulae are accordingly much easier to evaluate in practice. 

A simple illustration of this difference is provided by a comparison between the simplest 
statistic which depends on local comparisons and the simplest statistic which does not. 
It is advantageous to give this illustration before attempting a general discussion, so as 
to give a simple outline of the argument and the ideas involved. 

Suppose гү, 25, ..., 25, 4,29, to be a set of 2n evenly spaced observations from a stationary 
time series with mean y, variance c?, and autocorrelation function p,. The sampling variance 
of the statistic 


U zn [(2 — 23) + (25 — 24) +... + (ony —у„)] (1) 
is determined as follows: 
var U = vm so [(®2j-1— е), (95.4 —23)] (2) 
= DY g*( 7 P3i-p-1 Pap — Pipa) (3) 
=n i (1-2!) cam ) (4) 
8= —(n-1) n dent 


where A" is the second central difference operator in Comrie's notation, and refers to the 
suffix of p. The need for this statistic and its sampling variance might arise, for example, in 
the estimation of the treatment effect in a systematic experiment having the pattern 


Т ОТ С... Т О (Tztreatment; C = control), (5) 


or, alternatively, in the statistical control of the difference between the means of two 
interpenetrating systematic samples (Jowett, 1955 b). 
On the other hand, the sampling variance of the simple arithmetic mean 


T = (n7!) (x, +g ... + Taq) (6) 


| 
| 


2; G. Н. JowETT 161 


is easily shown to be given by the formula 
u 1 *-1 F |s] : 
var T xe ا‎ LS (7) 

In many applications, as |s| increases the autocorrelation function tends to become 
‘locally linear’, i.e. approximately linear over intervals of width equal, say, to twice the 
interval between successive terms of the series; as | а | increases further it gradually tends 
to zero. Thus A"p,, may attain a sufficient degree of smallness to be neglected in com- 
puting the approximate value of the sum in (4) ata much earlier stage in the increase of 
| s| than that for which ps, can be neglected in computing the approximate value of the 
sum in (7). 

Now experience and a priori considerations (cf. Jowett, 1953) suggest that for many 
series observed in practice the graph of the serial variation function 


ô, = E[3(2;—244)*] (8) 


is zero fors = 0, rises steeply as s increases with gradually decreasing gradient and curvature, 
and ultimately flattens out when в becomes large, rather like the graph of the function 


y = A1 - e) 
for positive values of the constants A and k. 
Since 8, = o*(1— ps), (9) 
it follows that —A’o%p, = Ad, (10) 


and hence that the graph of с?р, will have the same shape but inverted. The serial variation 

function has the advantage that for short lags it is effectively independent of long-term . 
variation in the series and requires no reference to the series mean, qualities which make 

it eminently suitable for use in the study of local variation. If for s > (є) the serial variation 

function has become approximately linear to such an extent that A"3,, is less than some 

predetermined small quantity €, we deduce from (4) that 


] +%—1) |s| : 
varU--' Ў (i-e (11) 
N g=—(8—1) n 
Qe (n—1) 
where |R|<— X = 56 : (12) 
leek, n 


Hence the error in the estimate of var U committed by truncating the summation at + (89 — 1) 
is at worst of the same order of magnitude as the departure from local linearity of 6, over the 
rest of the range, which is usually the same as that of the first term omitted from the sum- 
mation, since | A", | will usually tend to decrease monotonically for в > 8. 

Since we are usually obliged to infer the stochastic properties from one or at most a very 
few realizations of the series itself, it will often be true that var U is much easier to estimate 
than var 7’; the need to estimate the form of p, for large lags, often a matter of considerable 
difficulty, is not present, and the formula (4) may be computed approximately (provided, 
of course, that є = o(n™) and 8) <n) using ве ial variation statistics with relatively short 
lags, which are usually readily available. 


It will be observed that U is a function of the set of differences 2, — Tg, 25 — 24, ..., Which 


might be described as local comparisons, since they are constructed from Ec — of 
m. 42 


її 


162 Sampling properties of local statistics in stationary stochastic series 


the series. It is this which gives rise to the differencing in (4), which makes the formula 
independent of о?, and hence expressible entirely in terms of ôs, and which enables var U 
to be estimated in many circumstances where it would be impossible to estimate var T. 
À corresponding property has been observed in other statistics, such as the mean of a syste- 
matics ample (cf. Jowett, 1952) or a trend-reduced regression coefficient (Jowett, 1955a) 
which are functions of local comparisons alone, and suggests that the sampling properties 
of statistics which are built up entirely from local comparisons are essentially determined 
by the local variational properties (e.g. ô, for small 8) of the series themselves, and are 
effectively independent of the long-term variational properties of the series which are so 
difficult to measure in practice. The rest of this paper will be concerned with establishing 


this principle, and with illustrating it by some specific formulae. 


2. SAMPLING MOMENTS OF LOCAL STATISTICS IN STATIONARY NORMAL SERIES 
We shall assume a reference space S with position vector t, and shall use the symbol 


cT to 


denote a displacement vector, e.g. t; — t,. This is a generalization of the usual time axis in 
one dimension. A set of stochastic variables «,(t), (t), ... (which may be regarded as com- - 
ponents of a vector random function) with respective means /4,/5,... and variances 
01,01, ..., is defined at the points of S; in practice the definition is often at a lattice of points 
only, but for generality we shall assume definition at all points of S. These variables will be 
taken to form stationary series, i.e. their probability parameters to be invariant under 


translation in S. 


We shall be concerned with seminvariant local linear functions (s.1.1.f.’s) of these variables 


defined as follows: 
Lae | 201,0), faz. = о. 
8 8 


The integration is to be understood in the Stieltjes sense, so that 


La о | ао, 


(13) 


(14) 


where 12, is zero except at a finite number of points of S, and /**(t)is integrable in the ordinary 
Riemann sense. These linear funetions of z(t) we called seminvariant because they are 


unchanged by the addition of any constant to z(t); to justify the adjective local, 1%, 


and 


E*(t) will be taken as zero outside a minimal sphere in S of diameter &, bounded above by 
' some fixed value a. These 8.1.1.78 are generalizations of the simple local comparisons 


implied in $1. 


Tt will be assumed that the series are multivariate normal. This assumption may, how- 
ever, be dispensed with when we are concerned only with second moments of linear statistics, 
since the question of normality does not then arise. The assumption of normality implies that 


153/15 +. tg + other terms each having at least one of Mala «++ fy 
ав a factor (s odd), 


Bax, (t,) talta) .. 


+ other terms each having at least one of АРА 
ав а factor (s even), 


where af... uv is any permutation of l... 8 such that 
&«fly«0,...,u«v and & «y... «p. 


nt) = Hallas Hat I 00V, e(t, — ts) cov... (t, — tj)... OVa (t, — t) 


(15) 


(16) 


* 


G. Н. JoweTT 163 

If we prefer to work with lag variation parameters defined by | 
8, s (t7 t) = E}(x,(t)—2,(t’))*, (17) 
we have ба) = AU — Hy)? + HOE + FF) — cov a, (t7 €) (18) 
as the relation which permits us to substitute ê for cov. In analysis concerned with s.1.Lf.’s, 


the seminvariance usually results in the elimination of the terms involving means and 
variances. For example, م‎ 


соу (Lac Lpi) = E(L, sj) (since E(L,,) = 9) 
= |f оао ало) 


Р Í [очае Lat aL At) 
= [ааа 2 (19) 


The double integration is interpretable in the obvious way in terms of summations and 
Riemann integrations. 

In addition to the assumption of normality, we shall make a further important assump- 
tion about the nature of the lag variation functions (l.v-f.’s), an assumption which is very 
often reasonable in practice (cf. Jowett, 1953). We shall assume that the l.v.f.'s tend to 
become increasingly linear as | т | increases; or, more precisely, that for arbitrary (small) 
€» 0, and for arbitrary c, there is a value h,(e, c) such that for h > hoandh<|t|<h+c,wecan 
find a constant y; and a constant vector Xij such that 


bnl) = Vut Xa E(t), (20) 
where | ey(t) | <6- (21) 


Two s.1.1.f’s will be described as far apart if their associated minimal spheres touch the 
outside of a sphere of diameter greater than some predetermined distance ho. If we take ho 
as dependent on є as described above, and take ¢ as twice the maximum local diameter а, 


it follows that if La, Lg are far apart, å 
cov (Lais Lpi) -[[- 9, (t —t')dL,(t) dLj(t) 


ds f f [pyt xj (t= €) 6 aL OL 


= - [fs —t')dL,{t) аг (6), (22) 
and hence that [сот (Ler, Dp) | € efi aL, Í Jar) 
= 0(0). (23) 


On the other hand, if L, and Lp are not far apart, cov (Lai ру) is & function of би (т) 
over the range . 0« |« | <№+ 2a, (24) 


11-2 


164 Sampling properties of local statistics in stationary stochastic series 


i.e. in what may be called a (A, + 2a) neighbourhood of т = 0. If hy and a are sufficiently 
small (in practice, hy is often comparable with а in magnitude), the behaviour of 9,, a(t) 
in such a neighbourhood may be interpreted as a local variational property of the series. 

This property of being either dependent on the 1.v.f. in a neighbourhood of т = 0 or of 
magnitude O(e) is also true of the eross-moment of several s.l.l.f.'s. For the cross-moment 
of s of these functions, which have zero expectations, we have 


Г.Л O ! | “ | Bla (t) 2, (t) ... c; (€*7)] LL (t) dL,(t’) -.. ALt) 
[0 if sis odd, 


z Í f i f (-D* X (ijt t) Bynt — t")... t — te] 
af... nv 
x dL, (t) ... dL, (t5—») if s is even, (25) 


(where the suffix (xf) means a; Tig), since all the terms in the expectation of the product 
in the square bracket which have one or more of дү... ш, as factor are annihilated in the 
integration by the seminvariant property of the corresponding L’s. 

The summation in (25) is taken over the set of permutations of 1, 2, ..., s defined in (16). 
Hence if s is even, | 


E(Ly;, ... Lg.) = E cov (Кш Lpig) COV (Б „ Га) «++ COV (Lui, Lyi,)- (26) 


Since the right-hand side of ( 26) is a function of covariances, it follows that E (Lii Lei, «+. Га) 
is either dependent on the 1.v.f. in a neighbourhood of т = 0 or of magnitude at most of 
order є. 

If the series z;(t),... are made to coincide in sets, and the s.l.1f.’s Lir Loipe 
coincide in sets included in them, the result just established takes the following form: 

Theorem. The expectation of any product of powers of s.1.1.f.’s of any set of concomitant 
stationary stochastic variables for which the s.v.f.’s and the L.v.f.’s have a tendency towards 
linearity with increasing | т | (as defined by (20) and (21) and the preceding paragraph) 
is a sum of terms which are either of magnitude at most of order e or involve only values of 
the s.v.f.’s and l.v.f.’s in a (h+ 2a) neighbourhood of + = 0. 

It follows that the moments and cumulants of any statistic which is a product of powers 
ofs.1.1.f.’s have the same property, since these are expressible as linear functions of products 
of powers; and hence that the moments and cumulants of any statistic which can be 
expressed as a linear function of products of powers has the same property. The theorem 
is thus seen to be very general in character, for most if not all statistics which measure local 
properties and are of interest in practice may be expressed either exactly or approximately 
in this form. The theorem implies, broadly speaking, that the variational properties of 
local statistics depend essentially on local variational properties of the parent series. 


to 


3. ASYMPTOTIC MOMENTS OF LARGE-SAMPLE LOCAL STATISTICS 


Suppose that the statistics 1л... Lj, (s even and divisible by an integer р) fall into r sets 


of p, namely, M, ... Mp. Suppose, moreover, that the prod isti ling into 
the set „4, is denoted is M, Thus product of the statistics falling 


"i Lai Lan s lg, dr Тузан, pisa c Lap, taps es M = Lo, Hulgcipertt Li 
(27) 


С. H. Jowrrr 165 


It is easily shown that 
E(M, — E(M,)) (M,— E(M,)) ... (M, — Е(М,)) 
= X cov (Lu, Ly.) cov (L 
a2...» 
where af ... uv is a permutation of the integer 2... s, subject to the conditions (16) and also 
to the condition that no pair of suffixes associated with a covariance included in the formula 
may come from the same set „4. 

Two sets will be said to be far apart if the members of one set are far apart from the 
members of the other. If any set is far apart from all the others, the product-moment (28) 
will be O(c”), since every member of the set has to be associated in a covariance with some 
member of another set. For some of the terms in (28) not to be O(c"), for each set there must 
be at least one other set from which it is not far apart; if this is so, terms which are not O(c?) 
arise, but only when members from such neighbouring sets occur together in the covariance 
brackets. 

Many statistics which are of practical interest are means, or functions of means, of powers 
or products of powers of s.1.1.f.’s of the stochastic variables over a sample region of the space 
S or at a set of sample points, usually evenly spaced. Suppose that we are given sets 
Mis My, ...,M,,, where n is large (for example, of the same order of magnitude as the number 
of sample points), and where the sets are evenly spread over the sample region in such a way 
that every subregion of volume O(V /n) (i.e. O(1)) contains sets to the number O(1). Then if 


U = (n) (M, M, 4 ... - M,), (29) 
we have E(U - Е(0)) = (n7) E X, (M, — E(M,))... (M, — E(M,))), (30) 


Ly)... cov (Lg L,,), (28) 


ry 


where a, ... %, is a permutation of r of the integers 1 ... n. 

If we assume h<n, the number of terms in (30) which are not O(e”), i.e. which can be 
formed from products of covariances of s.1.1.f.’s which are not far apart, is O(n), since the 
number of ways of choosing pairs of sets from M, M ... M, which are not far apart is of this 
order. Hence E(U — E(U)y = O(n") + Oe"), (31) 
the second term on the right-hand side of (31) being justified by the fact that the summation 
in (30) has only n” terms altogether, and is divided by w. 

If e” = o(n-*), the moment will be dominated by those terms which involve only values 
of the l.v.f. in a (h+ 2a) neighbourhood of т = 0, and will be of magnitude O(n-'). 

This argument may be generalized in a fairly obvious way to show that cross-moments 
of order r about the mean for statistics having the same form as U also have this property 
and this order of magnitude. Thus, since many statisties of useful potential application have 
sampling errors which may be expressed, to а large-sample approximation at least, as linear 
funetions of products of sampling errors of statistics having this form, it may be shown that 
the lower moments of these also depend essentially on values of the 1.у.Ё. in a restricted 


neighbourhood of « — 0. = 
4. EXAMPLES 


Example 1. Variance оў a trend-reduced covariance 


Suppose we have a sample stretch x, уу, 2373, :-:, ZY» from the variation of two concomitant 
series, x, y. The covariance of the difference between successive terms, which is the simplest 
case of a trend-reduced covariance (cf. Jowett, 19554), is defined by the equation 


U = (n— 1) [(z4 — 22) (Jy — 2) + --- + (En-1— Tn) (Yn-1—Yn)]- (32) 


166 Sampling properties of local statistics in stationary stochastic series 
Then $ 


var U = (n— з cov [а — Lit) (9i Уга), (05 — Xr) Yj 7 Ууз) 


= (n— р eov [а isa) (0; 2543)] eov [(Y; — Уз), (У;— У+а)1 
+ eov [(%— 4), (Vj = 95«3)] eov [(J; = 0:1), (95 — 231)] 
n-li 
= (%— yz ek —Jj) A") —3) + [Azy(t—J) P 


= ey `$ (=) ai A, (0) + Ae OP (33) 


For independent series, д,,(8) is constant for all s, so that 

A"8,,(8) = 0. (34) 
The dominant terms in the summation are those for small | в |; as | з | increases, and the 6’s 
straighten out, the remaining terms will rapidly tend to zero. 


Example 2. The covariance of two serial variation statistics 
The serial variation statistics of lags s, and s, (5, < 8,) calculated from a stretch of series 
9, ...2, are defined as follows: 


n—8,—1 
4(ву) = (n—8,)+ 2 3)2; — Lita) 


n—8,—1 
d(s) = (0—8) Y, а-а). (35) 
Then id 
cov (d(s;), d(s;)) siis bite 
= (9-8) inat У exe cease t) 
= (n— 83)" (n— Др? 2[соу (%(4 — 2,4), (0; — 24.) 
= (n—58,)- (n— s) mE —C,).4[— d(x) — (a + 84 —8,) + (à — 84) + 0) + 8,)]* 
n—8,—1 REM 
PE sis 5 ae a HE 990-9955) X 8) ++), (36) 
where 8 +|o|, 8,—8, «a €m—58,—1, 
„= Н 0<a<8,—8,, (37) 
8&-|x|, —(n—8—1)<a<0. 


The dominant terms in the summation in (36) are again those for small |s |. As | в | increases, 
and ó(s) straightens out, the remaining terms, being proportional to squares of a kind of 
second difference, again tend rapidly to zero. 


Example 3. Spatial systematic sampling 
The systematic sampling will be of the variable z(u, v) which is defined over the rectangular 
region 0<и <, «vr, samples being taken at the points (i— 4, j— 1) for integral i, j 
such that i = 1,...,7,j =1, ores fa 


i 
Let L,-2a(i-j-3- RT fae v)dudo. (38) 


G. Н. JowzTT 167 
Then 
+1 -'+ 
eor (Lay Ley) = 286-0) |, jet -| 4-j + 0,4 


o-1 $-i-i-1 
+f f "D f Aui 60-7 - dude. (39) 
(-1J j'-1 (173-1 
This covariance із itself a s.LLf. of 8(0, ¢) over the region 
i-i -1«0«i—i' +1, j-j-1«$sj-j +1, (40) 
and will be negligible if the points (i, j), (5, j’) are far enough apart. 


The sampling variance of the mean U of a systematic sample of n (= т?з) about the mean 
of the rectangular region is therefore given by 


varU = varo $ X La) (41) 
i21 

et oS orit. a mt 2004—57), вау. 42 

n n» v (Li, Lys) = (n ) Ete 7,3—3). вау (42) 


In evaluating the summation in (42), we take the terms in order of increasing distance 
between (i,j), (i’,j’). Hence 


var = Heo] c(1,0) - m «o. |+ 0). (43) 


4n 
(ry — Us n(rs- 1) (rı = 1) (ras¬ 1) 
As the distance increases and 8(0, ¢) becomes locally linear, the terms will become 
negligible. 


In Ex. 1, 2 and 3 we are dealing with second moments, corresponding to r = 2 in (30). 
Hence we verify that they are all of order wt 277, provided that local linearity is attained 
rapidly enough. The next example is different, involving a greater value of r. 


Example 4. Fourth moment of the statistic U= (n3) {(жу — 23) + (@— за) +... + oni — Ton)? 
The statistic U is identical with that defined in (1). In the expansion of U*, the expectation 
of a typical term is given by 
Elai- Lai) (Para 7 zy) (agra Tar) Uni Жз") 
= А"8(2Е—7)А"8(@” =") +5"8(@—%") A"8(2 —i7) + A'0(2i — i) A'a(2i =i") 
= Uwr 88У. (44) 
The value of this depends only on the configuration of 4, i, i", i". If these are placed in 
increasing order of magnitude, we may denote the central interval by д, and the other two 
by p and r, where P <r. The number of terms giving the same values of p, q and r, 
and hence the same expectations, is equal to (n-p —q-") multiplied by the factor ар 
given below: 


Range of p, 9 T арат 
o<p<r; 4>0 2.4! 
0<p=r; 920 4! 
0=p<r; q>0 2.41/2! 
DEA an 
о<р<т; q=9 2.41/21 
0<p=r; 9=0 41/2! 
0=р<"; q-9 2.41/3! 


0=р=4=" 1 


168 Sampling properties of local statistics in stationary stochastic series 


If we write the total of these terms in each case as (n — p —q—1) В, we obtain the following 
result: * 
E(U)-n* У tire 
Lit i=l 
= п-% У (%—р—4—т) Boor 
p,qr 
(p&r,ptqr&n-1) 
n—1min(r,n-r-1)n—-p-r 


=n 2k 5 2 (n—p—q—1) Bpr (45) 
r=0 p=0 q=0 

If for $ > so, (2s) differs from linearity over the interval (2s — 1, 28+ 1) by less than some 

small quantity є, we have | A*à(2s) | < 4. (46) 


Since By А"8(2р) A"0(2r) + A"3(2p +q) A'0(2q +r) + A"8(2p+q+r)A"(2q), (47) 


Bpa will be of magnitude 0(1) if both р and r are less than sọ, of magnitude O(c) if just one 
of р, q, r is less than sọ, and of magnitude O(¢?) if none is less than sọ. 

The summation in (45) may be separated into parts having different orders of magnitude. 
If sokn, 


8-1 r n—-r—-p-1 
E(U)-w*X» X X (n-p-q-7B, 
r=0p=0 q=0 


n—1 8—1 n—r—p— 


1 
+n* У x (n—p—q—1) Bog 


n—1 min(r,n—r—1) min(s.—1,n—r—p—1) 


+17 У x 2j (n—p—q-1) В.а 
=8, P=8 q=0 
n—1 min(r,n—r—1) n—r—p—1 

+n у, 2) 23 (n—p—4-7) В. (48) 
T—8, DP=8 q=min (%—1,n—r—p—1) 


The four terms on the right-hand side of (48) are of magnitudes respectively 
O(n), O(n—te!), O(n-e) and O(n%?). 
Hence if € = o(n71) (49) 


the right-hand side of (48) will be dominated by the first term. To order 2—2, this may be 
put in à more readily computable form: 


E(U*) ~ п $, ja, A"8(2p) A^9(2r), (50) 
REC 
* n-r-p-1 
since = (n—p—q—r) = Mn-r-p)(n-r—p—-1) p, (51) 


and ара = ар, for g>0. To this order, we may neglect the term in apor: 

This example has been chosen because of its comparative simplicity, to illustrate the way 
in which the summations in these problems have to be arranged in order to take the im- 
portant terms first. It is a fourth moment, corresponding to r = 4 in (30), and we have 
verified that it is of order n- = n-*, provided that linearity in 4(s) is attained rapidly 
enough. It might be required in allowing for the effect of terms of order n- in the test of 


С. Н. JoweTT ri 169 


significance of the treatment effect їй (5) (cf. Small, 1954), or in obtaining control chart 
limits correct to order «71 for the difference between means of interpenetrating systematic 
samples from, say, a conveyor belt. 


REFERENCES 


BARTLETT, M. S. (1946). Some aspects of the time-correlation problem in regard to testa of significance. 
J.R. Statist. Soc. 98, 536. 

Јожетт, G. Н. (1952). The accuracy of systematic sampling from conveyor belts. Appl. Statist. 1, 50. 

Jowett, С. Н. (1953). The comparison of means for industrial time series. Paper given to Royal 
Statistical Society conference. 

Јожетт, G. Н. (1955a). Least squares regression analysis for trend-reduced time series. J.R. Statist. 
Soc. B (in the Press). 

Jowzrr, G. Н. (19555). The comparison of means of industrial time series. Appl. Statist. 4 (in the 


Press). 
Smarr, V. J. (1954). M.Sc. Thesis for the University of Sheffield, 


[ 170 ] 


MODELS FOR TWO-DIMENSIONAL STATIONARY 
STOCHASTIC PROCESSES 


By V. HEINE 
Applied Mathematics Laboratory, Department of Scientific and Industrial Research, 
Wellington, N.Z.* 
To aid the analysis of two-dimensional stationary processes, three different models are considered, 
derived from the second-order stochastic partial differential equation. Their correlation functions are 
caleulated, for fitting data in the form of a correlogram to the model. The corresponding Green 


functions describe the physical nature of the models, and in particular distinguish between space-like 
and time-like axes. 


1. INTRODUCTION 


The occurrence of stationary stochastic processes in two dimensions is well known. In 
analysing such data, it is at least useful, and occasionally of theoretical importance, to fit 
the correlogram to a particular plausible model and to estimate certain parameters in the ` 
model accordingly. However, owing to mathematical complexities, only the correlation 
functions generated by the very simplest models have been investigated in the past, 
particularly by Whittle (1954), and it would seem desirable to extend the range of available 
models. 

We shall consider the general second-order linear stochastic partial differential equation 


Ф? 02 e? д 
(ав hay + уя tes +e) Eo = 4) (1-1) 


where £ and є are the variate and the random impulses effecting it respectively, both with 
means zero. This leads to three types of model, corresponding to parabolic, elliptic and 
hyperbolic forms, whose correlation and Green functions we shall derive. 

In two-dimensional processes, one is necessarily concerned with the difference between 
time-like and space-like axes. In a time series, the variate can only be influenced by past 
events; and accordingly, a time-like axis is one, such that a random impulse can only pro- 
duce an effect in one direction along it. Besides time itself, examples are the distance down 
a steep hillside, and the distance in the direction of a strong wind scattering seeds. Along 
aspace-like axis, the variate depends on events in both directions. We discuss these features 
for each model in terms of the Green function, which represents the effect at one point, of 
arandom impulse at another point. In fact, the whole nature of the process is best visualized 


by considering the Green function, which thus shows what kind of physical processes the 
models may represent. 


2. PRELIMINARY THEORY 


The results of $$ 2 and 3 are analogous to those relating to one-dimensional time series, and 


we present them here but briefly in the particular form required later (cf. Bartlett (1946) 
and Daniell (1946)). 


Consider the stochastic partial differential equation 


ә д 
Иж, п) n = «в. (21) 


* Now at Sidney Sussex College, Cambridge. 


V. HEINE 171 

Assuming the validity of inverting the order of differentiation and integration, the solution 
may be written formally 

(x,y) = Г r G(z — u, y — v) e(u, v) dude, (2:2) 


where the Green function G(x, y) satisfies 


ә д 
Ц, g)een = aw), (23) 
using the Dirac delta function (cf. Van der Pol & Bremner, рр. 75, 315). The physical 
interpretation of (2-2) and (2:3) is that G(z — wu, y — v) represents the effect at the point (z, y) 
of a unit impulse at the point (u, v). 

If the e(u,v) are entirely uncorrelated random impulses, we have for the covariance 


functions d 

cov, (x,y) = expectation of [e(u, v) e(u —z,v—y)l 
| = @?ё(ж) (0), (2-4) 
. and cov, (x,y) = ep fi G(u, v) G(u —z,v— y) dudv (2:5) 


by using (2-2) and (2:4). In what follows, it will be convenient to calculate the function 
R(z,y) = É Ѓ С (и, v) G(u —z,v — y) dudv, (2-6) 


which can then be normalized right at the end, to givethe correlation function of іп the form 
p(x, y) 3 R(x,y)|R(0, 0). (2-7) 
Further, б and R must tend to zero at infinity, and R must be finite everywhere. 


3. USE or THE LAPLACE TRANSFORM 
The manipulative complexities, encountered in calculating the Green and correlation 
E functions from (2:3), (2:6) and (2-7), are handled using the two-sided Laplace transform. 
We follow completely the notation and dictionary of formulae in Van der Pol & Bremner 
(1950), reference to whom will be made simply by the page numbers. Thus if f(p, q) is the 
transform of h(x, y) we write (pp. 18, 334-6) 


hz, y) =f): 
where 1 f. q) = af | Ма, y) ахау. 


If g( p, q) be written for the transform of G(x, y), (2:3) gives (pp. 48, 345, 384) 


L(p,q)g(n.d) = P4: 
ie. Gæ =od = тїр, (81) 
The transform of (2:6) may be written down using (3-1) and the composition product 
formula (pp. 39, 382); thus 


.. GP q)g( —». =) _ pq 3:9 
мот Pq Lp, d -p -4) TE 


172 Models for two-dimensional stationary stochastic processes 
4. STANDARD FORMS 


By the simple substitutions z^ = — a, kx, + у, zcos0 -- ysin б and in the hyperbolic case by 
2 = ж-+сүу,у' = z-- cy; the general form (1-1) can always be reduced to one ofthe following 
standard forms: 


parabolic (6+) 0o): (41) 
elliptic &- a) + ^s £y (4-2) 
RC, NT C a 
degenerate Уз tan +f: (4-4) 


where а, 2, у are all real and positive, or zero. 


5. THE PARABOLIC FORM 


Consider Ws : 8) £ (2*2) -(@+ DE (5-1) 
From (3-1) elect YS rea | (5:2) 
hence et Pr Gar, = (р.374) 
= Ре (20) U0) (р. зв) 
һепсе G(x,y) = 5; ag (m - 7 U(y) (p.21) (5-3) 
US 2y (ry) 4y : 
where U(y) =1 (y>0); (5:4) 
=0 (y«0) 


The case of the lower sign in (4-1) is covered by considering / as negative in (5:1), which 
makes G in (5:3) non-vanishing at infinity, and is thus inadmissible. 

Equation (5-3) shows that an impulse at the origin only has an effect at positive values 
of y. Thus the y-axis is a time-like axis, whereas the x-axis is space-like. The Green function 
has the shape of a Gaussian error curve in the x-direction, with ever-increasing variance and 
decreasing amplitude as yincreases. Hence the y-axis may well represent distance downhill, 
downstream or along the direction of a wind in any kind of diffusion phenomenon. Other- 


wise it may represent time, during which something spreads out along a line, for instance, 
the descendants of a plant. 


From (3-2), 
R(x, TE a a 
“сете зу 


: 20, 2.2 
dex (SP y) If exp БЕ (р. 27). (5:5) 


V. HEINE f 173 
Without loss of generality, consider the two quadrants with y» 0. Thus rearranging, 


ONE teji p ]; 
R(x, y) e| -— aui 1 ee aom rna j 
gat Ta done rry (p-a) |p en 
E Б (prafy| , е AP) " 
Now pro ENR A |: s pm i (p. 21), (5-7) 
p сй! =» AY (узв ; : 
= ураа РИ 870181): 0:81). е 


We can now use the composition product гше (р. 39), also (5-7) and (5:8), in (5-6): 
R(x, y) = (constant) ©” i. 3 exp ( -ат— o — (y! —a2)| x-T ) ат. (5:9) 


The integral in (5-9) may be simplified by breaking it up into two integrals over the ranges 
т <a and т> х respectively, and then using linear substitutions 


ИТ) 


Using (2-7) to insert ће appropriate normalizing constant, we finally have 


p(z,y) = p(-z,—y), where y>0, 
e-24B [A-B 


= Г, edt 
t [at m 
ae ande ТЕ) 
with O<at<fy? [see (5-13)]. 


As emphasized by Van der Pol & Bremner (1950, pp. 19, 27), to ensure the validity of 
the above argument, it is necessary to show that (5-2)-(5-9) possess a common non-vanishing 
region of convergence. This requirement eliminates several other solutions of (5:2) and (5*5) 
which would otherwise appear to be formally possible, and also imposes some sufficient 
conditions ор а, f and у. By considering (5:1), (2-3) and (2-6) directly, without the use of 
the Laplace transform, these conditions may be shown to be also necessary. 

Thus (5:3) requires (pp. 26, 374) 


ЕХЕ (5-11) 
(5-5) in addition requires ri te |a (es | di 


For these regions to overlap, we must have the left-hand expression in (5-11) less than the 
right-hand expression in (5-12), which leads to 

0<а®< fy’. (5-13) 
It may be shown that if and only if (5:13) holds, do (5:2) to (5-9) have a non-vanishing 
common region of convergence. 


114 Models for two-dimensional stationary stochastic processes 


6. ELLIPTIC FORMS 
The special form 
"е ааа E 
да? ду, 7 042 ду? Y 


has been treated by Whittle (1954), who used it in discussing the yields of orange trees in 
a square array. For the sake of completeness, we repeat 


1 у. Pq 
б(®,у) = = Kolyr)== язду (рр. 357, 407). (6:2) 
R(x, y) is obtained by differentiating (6-2) with respect to y (p. 373). Then from (2-7), 
р(&, у) = (yr) Ky(yr). (6:3) 


Here and elsewhere, r = A(2* -- y*), and the К° are the modified Bessel functions of the 
second kind. Both the z- and y-axes are space-like, and for (6-1) the Green and correlation 
functions decrease monotonically equally in all directions. A simple change of scale in the 
æ- or y-direction introduces unequal degrees of correlation along these axes. This model 
would be applicable to a wide variety of circumstances, such as field trials on flat land. 

A form involving a further parameter is : 


ә д ð з әз 
Цеа) E) tig di 
From (3-1), (6-2) and pp. 357, 339, 
Q(z, y) = ar Kol aoa (6-5) 


This Green function is the same as (6-2), except that the term exp (az) increases the influence 
an impulse has in the positive x-direction at the expense of the negative x-direction, and 
thus may represent the effect of a wind or a slight slope of the ground. From (3:2), 


hz ЧУ. pq gt p4 
Beam |а аа cz eri 


Using (6-5), the integration rule (р. 51) and (2-7) 


4(y*—a*) (= 


? 0 = ачар), 


sinh (ат) Ko(y V(r? + y?)) dr, (6-6) 
where the normalizing constant follows from a formula in Watson (1944) on р. 388. Con- 
vergence of (6:6) requires a < у. 

All other elliptic forms are inadmissible. Consider 


90 0 д 1 J 
Uz, 5) = (&-—°) tà t (6-7) 
If a = у = 0, the Green function is logr which does not vanish at infinity. If ж = 0, y+0, 


the Green function is (yr) (p. 357), and the integral in (2-6) does not converge. If «+0, 
7 +0, the Green function } exp (xx) Y (yr) does not vanish at infinity. 


V. HEINE 175 


7. HYPERBOLIC FORMS 


ae" 2 a 
Consider цё. 5) - (+) (6+0) +: (7:1) 
From (3-1) and p. 346, and using the function U defined in (5:4), 

G(x, y) = eet PY Лу VY) U(x) U(y) 


PS < 72 
“(pta B)». qa 
is the only Green function that vanishes at infinity. Its form shows that anyimpulse can only 
have an influence in the positive z- and y-directions, which therefore represent two time-like 
axes. 


Fig. 1. Path of integration for (7-7). 


From (3:2), А 3 
iA rm p4 ) 
Re EE TEES ol. Ni? 
with region of convergence | Rep | <a, | Reg | < 8. If we put ‚ 
f(x,y) = gaz Pi Jy(2y 4 (xy) С(®) U(y) = e2 +P J(2y Jey) U( 2) U(— 9) 
== Pq VE P1 , 7:4 
[оет scar (4) 
B а mee a(s rag) ten = fen (75) 


Hence R(X, Y) is obtained by integrating f(x,y) of (7-4) in the ¢ direction, i.e. along PQ 
(Fig. 1), making an angle tan (0/8) with the x-axis: 


1 * 
REY) = gg po] of (76) 


176 Models for two-dimensional stationary stochastic processes 
In the first quadrant, f(z, y) is equal to the first term in (7-4), and 


= Фай NS -ax p а . 
PY) e Tarp | (+) [zr ena (тл) 


where the normalizing constant is obtained from a formula оп p. 384 in Watson (1944), 
In the fourth quadrant, f(x, y) is zero, so that p is constant along any line AB (Fig. 1), and 
is equal to p(B) as given by (7-7). In the third and fourth quadrants, p is obtained from the 
relation p(X, Y) = p(— X, — Y). It is necessary for the convergence of (7-7) that а>0 
and > 0. In evaluating (7-7) numerically, it would be helpful to employ a change of segle 
to make a and £ each 1/,/2. 


The form (È, 5) = (+2) (6+0) (7-8) 


may be considered аз а special case of (7-1) with у = 0, but it is easier to start again from first 
principles. From (3:1) and p. 386, 


G(x, y) = e-**-^" U(x) U(y) = чыл d (7-9) 
From (3:2), (2-7) and p. 386, 
p(y) = exp(-a|z|—4]y |) (7-10) 
over all four quadrants. 
Considering the form A E : 
ж) - 9) n) ic 


the above analysis (7:1) to (7-7) applies, except that J, must now be replaced by J, and the 
sign of y? changed. In this case it is necessary that у? < af for convergence. 


8. DEGENERATE CASES 


дз? ду, 
are strictly speaking inadmissible. For from (3:2), 


Degenerate forms u >) = COME 8-1) 
вепе О 1 “дуз ay P ( 


R(x,y) = 8(x) Ry(y), (8-2) 

. where R,(y) is some function of y; and thus R(x, у) cannot be normalized in the sense of 
(2:7), because of the delta function. 

However, a case may well arise in practice, where the correlogram is approximately zero 

everywhere except along one direction, say the y-axis. It is therefore necessary to consider 


forms arbitrarily close to (8-1); or what is the same thing, to consider (8-1) as the result of 
some limiting process. 


The important fact is that the result depends on the particular limiting process used. 
How this is possible may be seen as follows. Consider the rather restricted form 


ә д UOS 
Щз 5) wii limit (8-1) as у — 0. (8:3) 
The correlation function may always be written in the form 


P(x; y) = р(0, у) N(x/y,y), (8-4) 


V. HEINE 177 


where N(z/n,y)=1 for z=0, 
= small for x>7, since p — 0 as 2/7 -» oo. 


In the limit, the form (8:1) has the correlation function 


y) = N(x), 8:5 
where the null function (х,у) = pry) Nt) (8:5) 
N(x) = 1 (ж= 0); 

=0 (+0). 


there are many different forms satisfying (8:3), so there are different correlation functions 
(8-4) and (8-5). We give the following example. 


Limiting form: at ГА 


T (ка) (2+). 


Limiting correlation function, from (7:10): 
pla,y) = e-P" N(x). 


II form: (rE) atA) 
Limiting correlation function, from (5:1): 
plz, y) E шш m e? dt. 
N vuv) 


Corresponding results for other more general forms of the type (8:1) may be easily 
obtained. 


9. SUMMARY OF ALLOWED MODELS 


The allowed standard forms (cf. $4) are as follows: 
Parabolic; x-axis space-like, y-axis time-like: 


erh 
0«a? « бу®; 


Green function (5:3), correlation function (5: 10): 
Elliptic; v- and y-axes space-like: ы 
д 2 
ra 
0<2< У; 


Green function (6-5), correlation function (6:6): 
Hyperbolic; x- and y-axes time-like: 


D д 
(+9) (5*9) +7, 
a>0, 8>0, 720 
12 Biom. 42 


© 


178 Models for two-dimensional stationary stochastic processes 1 
Green function (7-2), correlation function (7-7): also 


д ð F 
tat- 
a>0, #>0, 0<у?<ар; 
for Green and correlation functions, see above (7-11). 


The problem was suggested by Dr P. Whittle, to whom I am also indebted for helpful 
discussion. 


REFERENCES 


BanrrETT, M. S. (1946). Stochastic Processes. Mimeographed, North Carolina lecture notes. 

DANIELL, P. J. (1946). Contribution to discussion on stochastic processes. J. R. Statist. Soc. B, 8, 88. 

VAN DER Por, B. & BREMNER, Н. (1950). Operational Calculus based on the Two-sided Laplace Trans- 
form. Cambridge University Press. 

Watson, G. N. (1944). The Theory of Bessel Functions, 2nd ed. Cambridge University Press. 


" WHITTLE, P. (1954). On stationary processes in the plane. Biometrika, 41, 434. 


[ 179] 


SOME PROBLEMS IN THE THEORY OF PROVISIONING 
AND OF DAMS 
^ 


Bx J. GANI 
Australian National University, Canberra, ACI. 


The paper begins with a review of two problems in the theory of provisioning with a discrete stock 
considered by Pitt (1946), and а short account of Moran's work (1954) in the theory of finite dams. 
It is pointed out that these provide two different methods of attack, each appropriate to certain 
conditions, on problems in the probability theory of a general storage function S(t) defined at time tby 
S(t) = I) -DH -F (t), 

where Z(t), D(t), F(t) are respectively an input, output and overflow function. This storage function is 
identified with the stock deficit in provisioning theory, or the dam content in dam theory, 80 that any 
problem and its solution in the one theory has an exact analogue in the other. 

The paper continues with the use of Pitt's results of the theory of provisioning in two analogous 
cases of the infinite discrete dam. This is followed by the application of Moran’s methods of the theory 
of dams in some analogous problems of provisioning with a discrete finite stock; exact solutions are 
obtained for the discrete and continuous cases of a particular problem in which ordering and replace- 


The paper closes with an exact solution of the general storage problem in the case of a finite con- 
tinuous storage function S(t), fed by a discrete input function of Poisson type, with а continuous 
output which has а steady rate when S(t)+0, and is zero if S(t) = 9, and an overflow function such 
that S(t) never exceeds a prescribed value, 


I. PROBLEMS IN THE THEORY OF PROVISIONING 
In his work on the theory of provisioning, Pitt (1946) is concerned with the derivation of 
stationary probability distributions for a discrete stock function s(t). This stock function 
represents the number of components in stock at a time t, and is allowed to take integral 
values ranging between К and — œ, where К = s(0) is the initial stock, and negative values 
indicate that components may be borrowed if they are not in stock. The function is defined 
for all values of time t (t continuous) by the equation 
s(t) = K+r(t)- c(t), (1) 
where c(t) is the consumption function, the number of components consumed up to time f, 
arandom increasing step function taking positive integral values, and r(t) is the replacement 
function, the number of components delivered and added to the stock up to time f, a function 
depending on c(t) and always smaller than or equal to it. 
The probability distribution associated with the consumption function c(t) is the Poisson 
with parameter @, 80 that the probabilities that in a small interval of time ôt there be one 


and no components consumed respectively are 
Pr {c(t + dt) — clt) = 1} = adt, | d 
Pr {0(0+01)— clt) = 0} = 1-adt, 

and increases їп c(t) in non-overlapping intervals are independent. Two types of replacement 


functions are considered: 

Type 1. r(t) = [c(t— Т) М-М, (3) 
where we define [2] as the integral part of 2; here, orders for a constant number of com- 
ponents M are sent out at irregular time intervals t— Т, when c(t— T)is an integral multiple 
of M, and are delivered and added to the stock at time t, after a positive time-lag Т; 


12-2 


180 Some problems in the theory of provisioning and of dams 
Type 2. rt) = c([(t—T)aM-1] Ma~}; (4) 


here, orders for an irregular number of components equal to the consumption in the previous 
time interval Ma- аге sent out at regular times &Ma- (k = 1,2,...), and are delivered and 
added to stock at times kMa-! + T after а positive time-lag 7. 

For both these types of replacement function, the stationary probability distributions 
of s(t) are derived by arguments which will be followed exactly in §IV of the paper. U sing 
these probability distributions, Pitt concludes with a comparison of the means of both 
the positive and negative values of the stock function s(t) when the replacement functions 
are of types 1 and 2, 


П. PROBLEMS IN THE THEORY OF DAMS 


The problem discussed by Moran (1954), that of obtaining the stationary probability dis- 
tribution of the amount of water in a dam, appears at first sight to be of a different nature. 
А finite dam of capacity K units, whose content at the times t = 0, 1,2,..., after water has 
been released according to a prescribed rule, is Z(t), where Z(t) will be found to lie in the 
range (0, K — M), is subject to the following conditions: 

(1) a discrete input X (t) which flows into the dam during the interval of time (t,¢+- 1), 
where the series {X(t)} is serially independent, and the probability that the input be i units 
(= 0,1,2, ...( is pi 

(2) an overflow rule such that in any time interval (f, +1) there is no overflow from the 

m if Z(t) + X(t) < K, and there is an overflow Z(t) + X(t)—K if Z(t)+X(t)>K; 

(3) a release rule such that at time t +1, M units are released from the dam if its total 
content Z(t) + X(t) > M, and the total content Z(t) + X (t) is released if this is less than M. 

Then, provided K M > M ; & System of equations is derived relating the probabilities 
En. u that Z(t) be equal to 0,1, -K-M at time t, with the probabilities 
PQ, рф, PY a that Z(t + 1) be equal to 0,1, ..., K — M at time t+ 1. These are: 


PY = Py(Pot+ pi ... + Py) +B P+... + py. a)... +Py Po; 

ey = орм +P, Pm +... + PuraPos 
TORA ETE S (5) 

Р м1 = Рурк. Е DR +... +Ре уру, 

Р.м. = 0K + hides +.. + Pe dy, 


o 
where, for convenience, we have written d; = У p,. Written in matrix form, these equations 
izj 


are P? = pP, where P, PO are column vectors with elements Fj, PP respectively, and p is 
the matrix of coefficients, 


The stationary distribution {F} is required, and this is obtained by writing P® = P in the 
set of equations (5) and solving P — pP together with the additional condition y Б = 1. 


і= 0 
It is pointed out that although no solution in an explicit form may exist in ^ oneral, the 
values of the stationary probabilities F, can always be evaluated numerically for any known 
matrix of coefficients p, and these are in fact computed for a special case. An extension of 
this discrete theory also allows equations for the continuous stationary distribution of 
be к Z(t) + X(t) to be written when Z(t), X(t) are continuous and U(t) lies in the range 
, со). 


J. GANI 181 


III. PROVISIONING AND DAM PROBLEMS AS GENERAL STORAGE PROBLEMS 
We now proceed to identify these problems of the stock function s(t) of provisioning, and 
of the dam content Z(t) in dam theory as particular problems of a general storage function 
S(t). This may be discrete or continuous, and defined for continuous time t or at fixed times 
such as t = 0, 1, 2, ..., according to the conditions of the problem, by the equation 


S(t) = 1) -Dt — FO, (6) 


where I(t), D(t), F(t) are positive increasing functions, discrete or continuous. I(t) is а 
random input function which feeds the storage function, D(t) is an output function which 
depletes the storage function and may depend on I(t) or be otherwise defined, and F(t) is 
an overflow function which operates to restrict the range of S(t) below a prescribed maxi- 
mum value. 

To show that the problems of Pitt’s stock function s(t) of provisioning can be regarded as 
problems of a particular storage function S(t), we consider the previously defined stock 


function (1), a(t) = K+r(t)—e(l) 


which lies in the range оо <8(0) < К. We note that this equation holds at any time t 
({ continuous), and that c(t), the random discrete consumption function, and r(t), the 
discrete replacement function (r(t) < (0), аге defined as in 51. We can here equally well 
consider the stock deficit, K-lt) = сй)—г\), (7) 


which ranges from zero when the stock is at its maximum К, to oo as the stock decreases 
through 0 to — 20. Naturally this is possible only when the stock has a maximum value K 
which it cannot exceed. Comparing this with a storage function 


S(t) = 1()- Ри), (8) 


for which F(t) = 0 for all t, it is clear that a discrete storage function S(t) defined in the range 
(0,00) for all time ¢ (t continuous), is identifiable with the deficit К — (t). Similarly, I(t), 
a discrete random input function, is identifiable with the discrete random consumption 
function c(t), and D(t), а discrete output function (D(t) € 1@)), with a discrete replacement 
function r(t). F(t) will be zero at all times, since the range of the stock deficit is (0,00), and 
no overflow is required to restrict this range below any prescribed value. 

In the specific problems considered by Pitt, where the consumption function c(t) was of 
Poisson type with parameter а, and the replacement functions of the types previously 
defined in (3), (4), as 1(6) and r,(t), the general storage equation would be such that 


S(t) = K —s(t), I(t) = c(t), D(t)=n(t) or (0), F(t) = 9. 


However, generally, for any stock function s(t), whether continuous or discrete, defined for 
continuous time t or at discrete intervals of time, providing only that the maximum stock 
Kis finite, it is always possible to frame the general storage equation (6) for the stock deficit. 
The finite maximum stock K results in the storage function S(t) having a range with lower 
bound 0; its upper bound may be infinite, as in the case just considered, or finite as we shall 
see in $ V. I(t), Рф) and F(t) will be appropriately chosen to fit the conditions of the problem. 

Similarly, we proceed to show that Moran's discrete dam content Z(t) with range (0, К — М) 


can be regarded as à particular storage function. In this case, however, the function Z(t) 


182 Some problems in the theory of provisioning and of dams 


is defined only at the times t = 0, 1, 2, -.., and we shall also define the Storage function at 
these times. In our storage equation (6), we may identify S(t) with Z(t) — Z(0), the increase 
in dam content after a time t; if, without loss in generality, we assume that Z(0) = 0, that is, 
the dam is empty at time t = 0, we have that S(t) the storage function is equal to the dam 
content Z(t). The random discrete input function Z (t) would be the sum of the discrete 
random inputs X(0), X(1), ..., X(t— 1) in the time intervals (0, 1), (1,2), ..., (t— 1, t), so that. 


I(t) = m X(r— 1). 
The discrete output function D(t) would be the sum of discrete outputs released at times 
т = 1,2,...,¢; let these be written d(1), d(2), ..., d(t), then we have that 
Ят) = M if Z(r-1)4-X(r-1)2 M, 
or d(T) = 2(7—1) +. X(r—1) if this is less than M, 
80 that the output function is у 
D(t) = рә d(r). 
Finally, the overflow function F(t) would be the sum of the discrete overflows in the time 
intervals (0, 1), ... (1—1,1); let these be written f(0), ..., f(t— 1), then we have that 
f(r-1)20 if Z(r— 1)4-X(r-1)« К, 
or f(r-1) = Z(r— 1)-X(r-1)-K if this is greater than zero, 
во that the overflow function is 


i 
P(t) = Лт 1). 
The equation for Z(t) at times t = 0, 1, 2,... could be written 
t t t 
Ut) = X, Х@т-1)- X dir)- Y /т-1), (9) 


а particular case of the storage function S(t). For a continuous dam content, a similar 
equation could also be written. More generally, for a Z(t) continuous or discrete, defined 
at any time t (t continuous) or at discrete times, with a finite or infinite range, it is possible 
to frame a general storage equation. ; 

In neither the case of problems of provisioning nor of dam problems will the Storage 


method of attack on a problem in the one theory may be useful in analogous problems of 
the other. 

We proceed to interpret Pitt's results in provisioning theory as solutions in two analogous 
cases of the infinite dam. We shall also apply Moran's methods in the theory of dams to 
some problems in the theory of provisioning. 


J. GANI 183 


IV. INTERPRETATION OF PrTT'S RESULTS IN THE CASE OF THE INFINITE DAM 
Pitt's equation for the stock deficit 


K —s(t) =e(t)—r(t) (r(t) <et), 


‚ was seen to be an example of the storage equation 


S(t) = It) — D(0, 


where at any time t (t continuous), S(t) is discrete and lies in the range (0, 20), I(t) is a random 
discrete input function, and D(t) is a discrete output function depending on /(t), (D(t) < I(t). 
Equally, a storage equation of this form can represent the discrete dam content Z(t) 
(S(t) = Z(t)), at any time t, where the dam is infinite (0 < Z(t) < со). In this case, interpreting 
the specific conditions of Pitt’s problems of provisioning for an infinite dam, we have that 
the discrete input function I(t) will be of Poisson type with parameter а such that, in a small 
interval of time ôt, the probabilities that one and no units of water flow into the dam are 


TEPE Pr (I(t4-80)— 1) =1) = аё, 

Pr (I(t4-àt) - 1( 50) = 1 д 
and increases in Z(t) іп non-overlapping intervals are independent. The release rules corre- 
sponding to Pitt's two replacement functions are such that the resulting output functions 
will be: 

Type 1. р) = Ut-7) М-М; (11) 
here, releases of M units are made at irregular time intervals t, after a time-lag T from the 
times t— T when I(t— T) is an integral multiple of M. In practice, this might mean that 
at every time when the input increased by M units, a decision would be taken to release 
M units from the dam, but this decision would be carried out after a time-lag T'; 

Type 2. D(t) = K[(t-T)aM 7] Ma}; (12) 
here, releases of an irregular number of units equal to the input in the previous time interval 
((k—1) Ma~, kMa~) are made at regular times kMa 4T (k = 1,2, ...), with time-lag T 
after the interval. In practice, this might mean that at the regular times kMa-, a decision 
would be taken to release the input in the previous interval Магі, but this decision would 


be carried out after a time-lag T. 
We proceed to follow Pitt exactly in deriving the stationary distributions of the dam 


(10) 


“content Z(t) when the release rules of types 1 and 2 are functioning, and in comparing the 


mean contents of the dam for these cases. 


(1) The stationary distribution of Z(t) for release rule of type 1 
In this case, where the output function is given by (11), the storage equation for the dam 


content can be written 
Z(t) = I()-U(—- T) ММ (0 « Z(t) « oo). 


Let the input function at times t, and t— T be such that 
цё Т) = and I(t) -I(t—T) =), 
where i and j can take the values 0, 1, 2, .... Then, writing 
Pt) = Pr{Z(t)=n} (n= 0, 1,2, ...); 


184 Some problems in the theory of provisioning and of dams 

for the probability that the dam content be » units at time t, we have that 
E(t) = Pr(i -j -[i:M3].M = n}. 

Now suppose that kM <i<(k+1)M-1 (k=0,]1,...), 


then it follows that 
[iM] =k, i=kM+v and j=n-», 


where v can take the values 0, 1,..., M —1. From the independence condition for increases 
in I(t) in non-overlapping intervals, it follows that 


P) = EPrü(-7)-i.Pr(I()-10—7)-) 


M-1 œ {a(t— T): 
= [00 1) Ce a —_. 18 
AUT (MH e E 
where we write c= ear Oty for rz0, 
and 9. = 0 for r«0. 
The stationary distribution is obtained when 100; we shall write it P, = lim P,(t). It 
follows that Ма Ж x 5e 
"ide " ЕНЕ fa(t — Tere 
B, à Inv lim Y, e DET (14) 


к=0 
that since v < M — 1, then for k = 0, 


о 
4 д gM £ p 5 
Now consider the series Y €" GM +o Where y = a(t—T); if y=v+(N+1)M, we have 


t PN e PA Dess 2 
d rere T] e ear ee uam ^ (15) 
For k = 1,2, ..., N, the following inequalities hold: 
јуни yeku 
ا‎ 
yf tk A | yotkM ук) м-1 , 
< okay M pret tere (10) 


but for Ё = N +1, only one side of the inequality is valid 


24 yet М+ї y OM qM 
(RFE ere TT) < x31) MA 


Further, for all values of k greater than N 4-1 (k = N 4- 2, ...), the inequalities are reversed 


a у?+ЕМ+1 qyr0M 
(v--EM +t * a na) 
yP tkm 4l gy Hk—DM yi 
<@+emi<™ +E err] (18) 


Summing the inequalities (15), (16), (17) and (18), for all values of k, and multiplying the 
result by e, we obtain 


M ун 
mai -Xew E: nih « 5 e E cci { 


toa (NLT ДҮ у of „ учам 
Cio @+(У+1уМ+!) ^4 (е0) < М 1+е-У— |. 


(v+(N +1) M)! 


J. GANI 185 


Now let № со; y will also tend to infinity, and we have that 


M-! < lim Y РЯ. М-ї 
Ша Жз (v EM)! < E 


so that the equation (14) for P, can be written 


M-1 
Р, = Ma > gae (19) 
The mean content of the dam is given by 


© M-1 


8)2) = M3 Уп X gre 


- м-{ E (n+ (n+ 1)+...4+(n+M— 1))} In 
= aT -- (M – 1). (20) 


(2) The stationary distribution of Z(t) for release rule of type 2 
In this case, the output function is (12), and releases are made at regular times kMa- T 
(k= 0,1, ...). Consider the time interval 


kMa4 T <t<(k+ 1)Ma*+T 


between two releases, where T may have any value (TZ Ma71); at any time in this interval, 
the content Z(t) of the dam has the value 


A(t) = 1(t)- I((t— 7)20M7] Ma?) 
= I(t) -IkMa?). 
Now if we write for the probability that the dam content be n units at time t, 


P(t) = Pr(Z)-») (n = 0,1, Laas 
then we have that 
P(t) = Pr{l (t) -I(kMa?) =n} 


= e-at-kMa- 3) (at — kM)" М (21) 
n\ 
This is periodic in Ma~, so that in order to obtain the stationary distribution (P,), we write 


(k+1)Ma*+T 
Р, = ам | P(t) dt 
t=kMa*+T 


= M- | i ien toT ay, 
u-0 n! 


where u= a(i -kMa-1—T). 
On expanding the binomial, we have 
(aT) ж | 
= TV) ya A. 
Р„= = oa ШО А еє-ч—-йи 
n 
= У Inv (22) 
v=0 


M и? 
where 9, = из, ea au 


186 Some problems in the theory of provisioning and of dams 
The mean of the dam content in this case is given by 


é(Z )= Ж n 3 In-v Q, 


© © 
= У Q, x (n v)g, 
v=0 n=0 
© 
= X QT + v). 
= 
It is easy to see that 
B = M4 "x ex du = 1; 
2 Ф, = 0 v=0 v! 1 


о M o ut 
5 00, = a Sve du = М, 
v=0 0 v=0 v: 


whence the mean is obtained as 


é(Z) = aT +4M. (23) а 


Comparing (23) with (20), we note that the second release rule gives a larger mean content 
of the dam than the first. 


. 
V. APPLICATION OF MoRAN’s METHODS TO PROVISIONING WITH A FINITE POSITIVE STOCK 


In dealing with provisioning, our notation will be more suggestive of the conditions of the 
problems if we write the general storage equation in the form ; 


S(t) = K —s(t) = e(t) (t) — F(t), (24) 


where S(t) is the stock deficit, s(t) the stock with maximum value K, and c(t), r(t) and Fit), 4 Y 


consumption, replacement and overflow functions respectively. 

Pitt considered cases where, since components not in stock could be borro wed, the deficit 
could become infinite, so that for replacement functions always smaller than or equal to 
consumption functions (r(t) < c(1)), no overflow funetion was necessary, and F(t) was equal 
to zero. We shall discuss the case, in practice equally realistic, where the Stock is finite and 
positive (0 < s(t) < К), so that the deficit is also finite and positive (0< S(t) < К). This is the 
case where no borrowing is allowed, and consumption is lost between times when the stock 
becomes zero (or the deficit K ) until there is a replacement which raises the stock above zero 
(or decreases the deficit below K ). This will require, in cases where r(t) is smaller than c(t), 
that an overflow function operates so as to restrict the deficit to a value no greater than K, 


equations which gave its stationary probability distribution; we proceed to apply Moran’s 
methods to three particular problems of provisioning with finite positive stock. 


(1) Provisioning with replacements at fixed times t — 1, 2, ..., and no time-lag 
This problem was designed to provide an example of an exact analogue in provisioning 
theory of the dam problem considered by Moran; in practice, however, it gives rise to a 
perfectly possible situation. Suppose that at fixed times # = 1,2,..., a consignment of M 
components arrives at a store for a stock replacement; if at these times there is a deficit 
equal to or greater than М ; the replacement consists of all M components, but if the deficit 


+ 


J. GANI 187 


is smaller than M, only enough components are added to the stock from the consignment 
to reduce the deficit to zero. There is no time-lag between the evaluation of the deficit at 
times t— 0, and the deliveries of components for replacements at times t. This might well be 
the case for a truck containing а consignment of M components which, in its regular 
deliveries at a store, discharges for replacements only as many of the components as may 
be required. This replacement policy can be expressed by the replacement function r(t), 
a step function with jumps at times t = 1,2, ..., such that 
r(t)—r(t—1) = M if S(t-0)2 M, 
ог r(t)—r(t— 1) = 8—0) if S(t—0)<M. 
Let the discrete consumption function c(t) for the store be defined in any interval of time 
0 by the distribution (p,(0)) ( = 0, 1, 2, ...), such that 
(1) Pr {elt +0)— elt) =i} = 240), ) es 
(2) increases in c(t) in non-overlapping time intervals are independent. 


The consumption will be met by components from stock only so long as the stock is positive 
(or the deficit less than K), but when the stock is zero (or the deficit K), no further orders 
can be met until a replacement arrives. This condition defines the overflow function F(t), 
for in the interval of time (t— 1,1) between replacements, the overflow required to restrict 
the deficit to a value К will be such that 


F()-F(-1)-9 if S(t—1) +e(t)-elt-1)< К, 
or F(t) F(t—1) = 80—1)+0) 4-1) К if this is greater than zero. 


. The storage equation (24) can now be written for times t = 0, 1,2,..., since the consumption, 


replacement and overflow functions are all defined ; this, however, presents no great advan- 
tage as it does not lead to à method of solving the problem of finding the stationary prob- 
ability distribution for the deficit S(t). 

Tf for t = 0,1,2,... we write the probabilities that the deficit be i components at times t 
and t+ 1 respectively as 

& Ж р. = Pr{S(t)=i}, РЁ = Ву{5@+1)=й, 

б о 

where i takes the values 0, 1,...,K—M, and if we write p; = Pi(1) and q; = EP then for 


К М> M the equations relating the probabilities are identical to (5), those obtained by 
Moran in his dam problem. In matrix form, we write ро = pP, where P, P? аге column 
vectors with elements Р, P9? respectively, and p is the matrix of coefficients. Exactly as 
in Moran's problem, the stationary distribution Po, Py, +-+> Pg l8 obtained by solving the 


K-M 
equations P — pP together with X R = 1; these {P} cannot always be obtained in an ex- 
i=0 
plicit form, but they can always be evaluated numerically for any given values of the {p;}- 
(2) Provisioning with replacements at fixed times 
t= kMat+T (k=1,2,---) with time-lag T < Mat 


(a) General equations for the stationary distributions 
This problem stems from Pitt’s second problem of provisioning under replacement policy 
of type 2, with replacement function (4). The differences arising are due to the restriction of 


hs, 


Ы 
188 Some problems in the theory of provisioning and of dams 
the stock s(t) to the range 0 < s(t) < К, or the deficit S(t) to the range 0 < S(t) < K, and to the 


consideration of the deficit at specific time intervals only. The consumption function c(t) 
is defined by (25) as in the previous problem, and the replacement function (4) is * 


Talt) = e([(t— T) aM-1] Ma¬), 


where an irregular number of components equal to the consumption in the previous time 
interval Ma¬" are ordered at regular times kMa-! (k = 1,2, ...), and are delivered at times 
kMa + Т, after a positive time-lag Т, where 7 < Магі. As the deficit is restricted to values 
no greater than K, consumption is lost in the intervals after the deficit reaches the value K 
until a replacement arrives to reduce the deficit to a value less than K. This defines the 
overflow function in such a way that, in the intervals Ma“, kMa-1+ Т, between orders 
and deliveries, the overflow is 


F(kMa? - T)— F(kMa-) = 0 if 8(kMa-) + c(Ma3 + T)—c(kMa?) < К 
or F(kMa- + T)— F(kMa) (26) 
—S(kMa?)--c(kMa- + T)-c(kMa3)— К | if this is greater than zero. 
Similarly, in the intervals kMa1+7, (k+1)Ma-, between deliveries and new orders, the 
overflow is 
Р((Е+1) Мат) – F(kMa 4- T) = 0 if S(kMa t T) + ¢((k+1) Ma) – c(kMa-t 4- T) < К 
or 
F((k-+1)Ma>)-F(kMa + T) 
= S(kMa +7) + c((k+ 1).Ma) - c(Ma- +. T)—K if this is greater than zero. 
The storage equation (24) is now fully defined af times (Ma¬, kMa- + Т, but, as in the 
previous problem, this does not lead to a solution of the stationary distribution of S(t), 
the deficit. We can, however, obtain sets of equations relating the probabilities 
TD. De, PP, PP, РФ, ..., PO, and PP, PP, PP, ..., PO, 
that the deficit S(t) be equal to 0,1,2,..., K, at the times Мат, kMa 4 T, (k--1) Маг! 


respectively. Writing p; — pT), and д; = У, p; we have for the times Магі, kMa^? 4T, 
the relations Ha 
Pp» E Pol Poth + E T Pg) t do Py, 
PO = PlPy+P, +... + P3) +9 Pg 


9eseseossoseosctisosóéseevstoeeeoesseéscesesdilace 


(27) 


PR = qr Pp 
or in matrix form, P = pP: 
PP Po Po o .. .. Po 4\ |Р, 
Po 
Seide au ^ 045 0 В, 
E (28) 


Р? Жр, р, ... Pa'h 0 0 


PQ E O Seia ses! | ote e OM) VEE 


J. GANI 189 


Similarly, writing p? = pí(Ma^ — T) and 4% = E pi, we have for the times Ma^! + T, 
(k+1) Mar, the relations мы 


РФ = pP P 
РЭ = pPPP+ DPD, | 
a tie e» 
j PR = d PO +R PP e d PR) 
or in matrix form, P® = p? PO: 
P? p 0 0.522512 0 P? 
P? © р 0 М. 0 PY 
PE |=| pp pP po. 0 PY |. (30) 
Р? R R- dace oce qp] \PR 


The relations between Р) and P, the probabilities of S(t) at consecutive ordering times 
kMa3, (k +1) Ma^, are then given by 


P2) = p™pP, (31) 


where the matrix pp is square. For ordering times, the stationary probability distribution 
Py, P, ..., Pg can be found by putting Р? = А for i = 0,1, ..., K, and solving the set of 


K 
equations P — p pP together with > Р,= 1. This solution may not be obtained easily in 
i-o 


an explicit form, but it can always be computed numerically for any given distribution 
(n0). i = 0,1, --- We note, in passing, that this stationary distribution for ordering 
times will be easier to compute if we first obtain the stationary probability distribution 
PO, PP, ..., PR, for delivery times. Writing Р? = Pr {S((k +1) Ma^ T) =i} for the 
probability that S(t) have a value i at delivery time (k+ 1) Ma? - T, we have that the 
relation between P and P), the probabilities of S(t) at consecutive delivery times 
kMa- 4 T, (k4- 1) Ma^ T, is given by 

po = pp®P®, (32) 


where the matrix pp? is triangular. 'The stationary probability distribution for delivery 


times, PP, PP, ..., PY, can be found by putting РЭ = PP fori = 0,1, ..., K, and solving the 
K 


equations P® = pp? PO, together with y PP'=1. For numerical computation of P, 
i=0 


the work is considerably lightened by the fact that pp” is triangular; once the stationary 
probability distribution for delivery times is obtained, the simple relation P = p®P® will 
enable the stationary distribution for ordering times, Po, B, ..., Px to be found. 


(b) A numerical example 

Аз an illustration of the computations necessary to evaluate the stationary distributions 
{P} and {PP}, for ordering and delivery times, we construct an example in which the con- 
sumption function c(t) has, associated with it, a Poisson distribution such that 


i 
pO) = eat (i = 0,1,...). 


190 


the time interval between orders Ma-! = 


Some problems in the theory of provisioning and of dams 
We choose the maximum deficit K — 9, the time-lag between order and delivery 7 = 


b 


1, and the mean consumption per unit time a = 4 


Then, since 7 = Ma—7 = 3, we have that the p; and p™ defined previously are equal, 


P= pP = efi! (= 0,1,...). 
From Molina’s tables for Poisson’s Exponential Binomial Limit, we obtain 


Po = ph = 0-135335, 
Pı = pP = 0-270671, 
р» = рї! = 0-270671, 
Ds = pf? = 0-180447, 


qo = di? = 1-000000, 
qı = df? = 0-864665, 
4» = qP = 0-593994, 


4з = qf? = 0-323324, 


Pa = рї = 0-090224, q, = 40 = 0-142877, (33) 
Ps = pP = 0-036089, g; = gi? = 0-052653, 
Ds = рё) = 0-012030, % = 96 = 0-016564, 
Рз = p = 0-008437, g, = 4¥ = 0:004534, 
Ps = pP = 0-000859, ф = gf = 0-001097, 
Ф = 40 = 0-000237. 
The equations giving the stationary distribution (PX?) for delivery times are P® = ppp, 


K 
together with à P = 1; these, in the case of the {01} and {q;} given in ( 33), result in eleven 


equations, not all independent, for ten unknowns: 


— 0-864460 PQ? + 0-136284 PD + 0. 139256P + 0-149657 PQD + 0-180862 P 
+ 0:258876P® + 0-414902 PD + 0:645941P0? + 0-882981 PH + P® = 0 
0:271117 P$ — 0-727584 PXD + 0:276590P + 0-287625 P? + 0.31001 2P» 


э 


+ 0:339183P) + 0-343934 P(D +,0:270671P%) + 0-117019P@ = 0, 
0:271486PQ + 0-273334 P0 _ о. 722144P + 0-285591 PW + 0-290341 Р 
+ 0:27067 LP? 4- 0-197408 (D +. 0-080388 = 0, 
0:181348PQ? + 0-182615P@ + 0-183837P — 0-819553P + 0-1 60777PY 
+0-111934Р + 0-043756P@ = 0, 
0-090630Р@-+ 0-090224 P9 + 9-086834 PD + 0:075304P — 0۰94911 тра 
+0:019336Р = 0, 
0"035683Р9? + 0-033922 PX9 + 0-028904P) + 0-019136P® + 0.007 126p» 
— pp =Й, 
0:011129P® + 0-009368P® + 0-00611 1Р} + 0-002242 PW 
- P) = 0, 
0:002623P® + 0-001692P + 0-000614P)) 
— pp = 0 
0-000413P9) + 0-000148P@ 
m: 1) 
0-000032 PX н i 
; у, Pp» - 0, 
РЁ 4 Po ve POs P+ Pp 
+ P). РФ + P+ P+ PO = 1. 


J. Gant 191 
Solving these equations by straightforward elimination, which is greatly simplified by the 
fact that pp” is triangular, we obtain the following values for {PP}: 
PP = 0-148685, PP = 0-026833, 
аз = 0-281325, PP = 0-006382, 
РФ = 0-277080, P® = 0-001036, (34) 
РФ = 0177621," PP = 0-000103, 
PP = 0-080930, PP = 0-000005. 


From these we can compute the stationary distribution (Pj) for ordering times, using the 
relations between P and P® which are given in matrix form by P = p? P». This is easier 


K 
than solving directly the equations P — p?pP and У P, = 1, where the matrix рр is 
square. We obtain for (P) the values: =< 


Р, = 0.020122, P, = 0154360, 
Р, = 0.078318, P, = 0-099024, 
Р, = 0-153890, Р, 0-053658, 
Р, = 0.202012, P, = 0-025004, 
Р, = 0198206, Р, = 0-015410. 


(с) Comparison with Pitt's results for an infinite deficit 


problem for the infinite deficit, when the consumption function is of Poisson type, and the 
replacement function is the r2(¢) defined in (4), is the exact analogue of the problem of the 
infinite dam considered in $1V (2). With a slight change in the notation of (22) to avoid 
confusion, we write the stationary distribution of the deficit S(t) when its range is0<S(t)< oo 
as {II} ( = 0,1, ...), Where t 

П; = 2 Pin Qe: (35) 


iv 
Here, the p; = eo? ei ‚ and the Q, are given by 


To do this we consider the time interval 0, where 0< Ma^ such that kM a7 - T +0 is 
any time between two deliveries at times kMa+T, and (k+1)Ma*+ T. Let the prob- 
abilities that there be consumption # in this interval be 

i 


a0 
p0) = cat , 


192 Some problems in the theory of provisioning and of dams 


then if {P(kMa+7 +0) , ЕНЕВ the distribution of the deficit S(t) at the time 


kMa+T 4-6, this will be given in matrix form by 


Py(kMa4 +7 +0) po(8) 0 DEON esc due 0 P» 
P(EMa- 4 T 4.0) ОВЕ V ng pp 
Рк 1(ЕМа1+Т+0) Px-i(0) P0) .. .. эб о PR- 
Pg(kMa T +0) 9к(0) 4x30) .. .. аб) qq(6) \ Pw 


where 9;(0) = 5 2,(0), and Рр, PX», ..., PQ, are the stationary probability distribution of 
izj . 
the deficit at delivery times. We can Write these in the forms 


i 1 
FkMa™t +7 +0) = У PPP) (= 0,1,..., K — 1), 


K 
Fe(kMa +740) = У РЕ а.) 
€ 


The stationary distribution {IP} will then be obtained from the (F(kMa-! - T 4-0)) by 
averaging them over the interval of time Ma71, so that 
Ma^ i 
п? = ам А У Р? „р„(0) dû, 
0 v=0 
i 


= >Р, @ = 0,1,..., K 1); 


4 (36) 
Ma- к 
OSY mf У РО ,9,(0)40, 


0 т=0 


where there'is some similarity of form with (35). 
` In the case of our numerical example ($V (2) (5)), where K = 9,T —1, M =a — 4, the 
values of the Pi» in (35) are given by equation (33) for values of i = 0,1, ...,8; the values 
of the Q, in'(35)'and (36) are given for v = 0,1,...,8 by 

Q = 0-245421, Q2 0۰141633, © = 0۰027669, 

Qı = 0-227106, 0, = 0-092791, Q; = 0-012784, 
"> @ = 0-190474, 0, = 0-053718, Qs = 0-005341; 
and the values of the {PP} are given by (34). Using these, we compute 


Ilo = 0-033214, ПО = 0-036490, 
П, = 0-097164, ПФ = 0-102810, 
I, = 0-153677, ПФ = 0-160212, 
II; = 0-176480, ПФ = 0-181162, 
П, = 0-165573, ПО = 0-166619, 
П, = 0-134440, II = 0-132132, 
Il, = 0-097291, П = 0.093169, 
Il, = 0-063731, ПФ = 0-059328, 
П, = 0-038132, II = 0-034385, 


© 
à II; = 0-040298, IIP = 0-033693. 


J. GANI Е 193 


There would appear to be a close similarity in the values of the {T} and (10), so that we 
could use the more easily computed values of II, as approximations for the ПФ; however, 
it is not possible without further work to state within what bounds this approximation 
would be valid. 


(3) Provisioning with orders at fixed times t = kMa~*, and replacements at 
fixed times t = (k+ 1) Ma? (k= 1, 2,...), with time-lag Ma^ 
(a) General equations, and an exact solution for the case of consumption with a discrete geo- 
metric distribution 
This is a special case of the previous problem (8 V.(2) (a)), when the times for delivery of 
the replacements and for ordering coincide; that is; à new order for replacements is put 
in every time a delivery occurs, at time intervals Магі. The consumption function is of the 
same form (25), the replacement function is r(t), (4), the overflow function is defined by (26), 
where the time-lag T' is Ma^; in these conditions, with the same notation as before, the 
equations (28) relate the probability distributions {P} and {PP} for 5(0) at consecutive 
ordering and replacement times kMa-!, and (k-- 1) Мат! (Е = 1, 2,...). To obtain the 
stationary probability distribution Pp, Pp...» Pg, for these times, we write PP = R 
* x 
(i = 0,1, ...,.K), and solve the equations P — pP together with P P= 1. 
-0 
If, for convenience, we group these equations in a slightly different way in the top half, 
we have Py = pol Pot P) t Po 
Р, = pP ...+ Pk-1) * da P 


(37) 


Pg = dk Pv 
1 = P+ B.E 


which can be solved in pairs, starting with P, and Рк, and working towards the centre of 
the group in the order P, Pg-1: Py Pk; ete We obtain the recurrence relations 


P, = pd —didx) ^ 
Pg = qkPo— qx pod qı) ™": 
Р, = {р(1—Рк) +q2Px-1Po} (1— dedi) ~ 
Pg. = Pratot qa D 
for the first two pairs of solutions. Assuming that all probabilities Р, from P,, P, ... to Ra, 
айа Pg, Pg 4; +++ t0 Preiss aTe known, the following formulae enable all other Р, to be found 
from these by a repetition of the process: 


K i-1 y 
R= fodi ы ү С 1 +Фа®к- Ў Р, x (1—41 qk) > 
ad T 


К-і+1 (38) 


i-i , 
Pga Pk E Pat dh @=1,..,К—1). 
ne 
A distribution {р} which gives a simple explicit solution of these equations is the geo- 
metric; it is important to note, however, that this does not satisfy the independence con- 
dition for non-overlapping intervals of time postulated in (25). For the geometric dis- 
Е" р. = АВ = 0,1.) (39) 


13 Biom. 42 


3 


194 Some problems in the theory of provisioning and of dams 
where A +B = 1, we find from the recurrence relations (38) that the stationary distribution 
of the deficit {P} is given by 

Fy = АВ(1-Вкуал = û, ond ua (40) 


infinity in such a way that the maximum deficit KA tends to the value £, and a deficit, 
consumption or replacement iA tends to the value s (ê = 0,1,...,K). The distribution of 
the consumption then becomes continuous; for if we put 


A= ls EFA, 
В = er4, 
and let A> 0 in (39), so that iA +> 8, we have that the distribution of the consumption in the 


interval of time Ma~ is 
P(s)ds = lim Ap, ds 
A>0 


= lim A-1¢-#ia (1—e-#4) ds 
A0 


which is continuous, and of the exponential form. The solutions (40) of equations (37) will 
1 EL : 


^ = шети ds; 
(2) a replacement function such that 
12((n +1) Ma~) = c(n.Ma-), 


Where irregular quantities lying between the values (5, 5 + ds) equal to the consumption in 
the previous time interval Ma-1 are ordered at times nMa~ and delivered and added to 
the stock at times (n-- 1) Ma; 


(3) àn overflow function such that in any time interval Ма-1 between ordering and 
delivery times »Ma-1, (n+ 1) Ma, the consumption is lost at any time after the deficit 


/(в)ав = jim лэр, 


= «e F(1 — e-Hk)-1 dg (0x s« &), (42) 
à truncated exponential distribution. 


$— 


J. Gant 195 


B 
This solution can also be obtained by taking the limit as A tends to zero for equations (37); 
this gives an integral equation for the continuous stationary probability distribution of 


the deficit, k-s 
де) = pto)de f, Iau + fes) foa. (43) 
It is easily verified that (42) will satisfy this equation. 


VI. AN EXACT SOLUTION OF THE STORAGE PROBLEM WITH A POISSON INPUT 


In attempting to find algebraic solutions for the stationary probability distribution {Pj} of 
the dam content Z(t) in Moran's problem ($1I), where P — pP, P being the column vector 
of the P, it is found that no simple solution in an explicit form exists for an input function 
of the Poisson type with parameter a, such that in the interval of time t, t+ 1, the prob- 
abilities of an input X(t) being i units is 
P= eai (ё = 0,1,...). 

However, if in this problem the conditions for the input of Poisson type and for the overflow 
are left unchanged, but the release rule is altered so that instead of the release rule (3) of 
§ 11 (M units at times t+ 1 if the total content of the dam Z(t) + X(t) is greater or equal to M, 
or the total content of the dam Z(t) + X(t) if this is less than M), a new release rule is pre- 
scribed so that there is an output with a steady flow when the dam contains water, and no 
output when it is empty, then Z(t) is continuous, and it is possible to obtain solutions in 
an explicit form for its stationary distribution. 

Our method of approach will be to consider Moran’s set of equations (5) for an infinitesimal 
interval of time ôt, instead of a unit interval of time; the input will remain of Poisson type, 
and be a multiple of a definite discrete unit which will not tend to zero, while a small discrete 
release will be made at the end of each infinitesimal interval of time under a rule of the same 
type as rule (3) of § TI. The equations can then be solved explicitly, and when dt— 0, these 
solutions will provide the continuous stationary probability distribution for the dam content 
Z(t), which is then itself continuous, when the release rule prescribed gives an output with 
a steady flow when the dam contains water, or no output when the dam is empty. 

Before proceeding further, it is preferable, since the theory of dams was identified with 
the general storage theory, to view the problem as one of the general storage function. It 
then becomes possible to interpret the problem as one of provisioning as well; here we would 
have a consumption of Poisson type, à replacement rule such that there is continuous 
delivery at a steady rate when the stock is not at its maximum value, and zero when it is, 
and an overflow function such that the consumption is lost when there is no stock to meet it. 
In practice, this could conceivably be the case of a stock of grain in a silo, where the replace- 
ment is a steady flow of grain which is stopped when the silo is full, and the consumption of 
Poisson type is lost when the silo is empty, and met from the stock otherwise. 

We frame the conditions of the problem for the storage function as follows: a discrete 
storage function S(!) is defined in the range 0 < S(t) < (К – 1) A, where К = bH + U (b, H,U 
integers) at the fixed times t = 0, ôt, 281, ..., and is subject to the conditions that 

(1) it is fed by a discrete input function I(t) of Poisson type with parameter a, such that 
in a small interval of time 8t there may be inputs of HA or of no units respectively with 


probabilities p = Pr{I(t+8t)—1() = HA} = 1-7%, 
4=Р:{00+0)-0) = 0) =e; 


13-2 


196 Some problems in the theory of provisioning and of dams 


S(n t) + I ((n + 1)2t) — I(nôt) > A, 


(3) the overflow function is such that in the interval nêt, (n+ 1) dt there is no overflow if 


S(ndt) + 1((n+ 1)9t) — I(nàt) < KA, 
but there is an overflow 


Sindt) +1((n +1) 8t)—I(nat)— KA, 
if this is greater than zero. 
Writing P, PP for the probabilities that S(t) take the values iA at times n dt, (n +1) dt 


respectively, where { = 0,1,...,K —1, we have that the relations between these prob- 
abilities are 


Рф = qPqD, 


(44) 


PR. - Ру = Po yg... +В), 
or іп matrix form, P® — PP, where Р, ра) are column vectors with elements Р, РЇ respec- 
tively, and p is the matrix of coefficients, 

Formally, this їз Moran’s set of equations (5), in which we have set p, = д, Dg = p, and 
all other P: = 0 for i not equal to 0 or Н. For the stationary distribution of S(t), we put 
K-1 
РФ = P, and solve the equations Р = PP, together with У B = 1. We see Clearly that the 
i= 
solution of these equations will fall into the (6+1) distinct Classes: 
(0)th class: "UTE e Ке, DA 
(1)th class: Darn Pra a 


J. Gast wi 
The solution for the Oth class is immediately obvious from (44). 


Р, = РР” (В = \,?,...Н-1); - (45) 
for ae MA 


Puin = Bas Хон " (= 1....0:8= OH 10). 086) 


relating any probability Pig, in the jth class with the probabilities Р-р o Para 9 
the (j 1) cass, so that starting with the solutions (45) for the Oth class, we can obtain 
solutions progressively for one class after another. 

Consider the formula 


рага Partt ange (7 T 1)4 )م‎ arc 

(47) 

here Eo 1,2, и kej 0, and R= 0, 1,...,H—1 forj = 1,3,...,0, and where itle 
understood that (^) = oifr <0, or if n <r. For j = 0, this gives 
Pg = pP (R= 1,2,..,H-1) 


which is the solution (45) for the (0)th class. We prove that formula (47) holds for all values 
of j by induction; assume that it holds for the class (j — 1), then substituting for Ру рну 
and Рун-1 given by (47) in the equation (46), we obtain 


ER EC py xm ii UE oper 


-PPri Y Toe y pigeat 
«(6 i-a een, م‎ 1- sitet "n 


#1 
b nnm d X (epg (P e 


+3 кй »(! jon ?. £ (0 Sh gah tk: ?) 


scene C; A n 


Now in (48), consider 


(rud 


"s t Hp nl 
Nr à IM Е 45 >т [К 1 , 
then 
Ga 2 iade ngo?) 
ore dat Ee) y (г 


E 


LEM 


198 Some problems in the theory of provisioning and of dams 

In exactly the same way, we reduce the other terms in (48), so that 
Q-v)H-«v-2, а (G-v) H4+NGy_2 _ (0-9) H- Re v-1 
Marb aay )-( 


N=0 0—1 v 


R Те) 205 ls Dos ns. 
да PA j-2 +P >, j-1 2—1 5 2 


, 


We thus obtain for Pap in (48) the following expression: 
Ви+в = pPy tum 2 | 
j £ ын j-e)H e 
x [i+ E (чау v)H - Rv ') «(0 v)H - Rv JJ). (49) 
v=1 a? ' 


idea of their structure: 
Ikr =pPg® (R= 1,2,...,H—-1); 


Pair = phe e +10), 


енген re) „ы (A) 


R+1 
Fanner re) (rene ft) 


x Leges | Oe ) *»^ 15) (R = 0,1,...,H~1). 


For the complete solution in any particular case, the value of P, is also required ; this can 
K-1 
be found from the equation Y P, = 1, 
0 


rom 
We now proceed to the continuous case by letting d¢—> 0, aril consequently A = 001-0, 
where the constant С can now be interpreted as the tate of output per unit time. We allow 


A to tend to zero, and the previously defined integers K, H and R to tend to infinity in 
such а way that 


Tange of values, 0<S(t)<k, where k = bh+u, and is defined for all time £ (t continuous) 


p= Pr(I( 3t) — I(t) =h} =le ы ] 6-0 _ lem, 
q= Pr {I+ 8t) — Га) = о} =e aeua er, 
where и = а/[С. 


9. 


J. Gast a 199 


(2) the output is such that it has a constant rate C per unit time, provided the storage 
function is not zero, but is zero if the storage function is zero; 
(3) the overflow is such that in any interval of time ĝt there is no overflow if 


S(t) +1(t+ dt) - I(t) Sk, 
but there is an overflow 


S(t) - I(t4-80) - 00) — k 
if this is greater than zero. 4 


We write the continuous stationary probability density for S(t) as f(s), so that 
f(s)ds = Pr{s<S()<s +ds}. 
Now for s = lim (jH +R)A = jh +r, we have that М 
А0 
f(s)ds = lim Pg, A dá; 
۵+0 
this enables us to obtain from (45) and (49) the equations for f(s) in various ranges of magni- 
tude A. For j = 0 (0<r <h) we have 
f(r) = lim A? pP,r* = lim P,A-(1— 6-7) ôR 
А-0 д0 
= ше" Py. (50) 
For j = 1,2, ...,b and 0<r<h, we have 


f(jhtr) = lim Ap ny neni JE A (= 1e pig 
jr tenete 


v- 
j 
= lim А-Ҷ1-е7%) sit р} + (eee 
A-—0 т=1 


j-v)H+R+0-1 G7) H-R*v-1 
x[par-(97 77, )euar (977 
Pc 0—1 "((4—v)h v 
E m 5 ےر‎ EO yE ((7 D +r) 3 (51) 
, i= (v—1)! v! 

In addition to these probability densities, which are continuous in the intervals in which 
they are defined, there is also à concentration of probability P, at s = 0. To complete the 
solution, we require the value of Р; this will be given in any particular case where the value 
of k = bh + u is known by 


= u 
A дй+о&+| f(bh+r)dr = 1. (52) 
j=0Jr=0 r=0 
We write out in full some of the formulae for the f (jh +r), in order to indicate their structure: 


fir) = pe Py (0<r<h), 
f(h+r) = uote РД =e (1+ 7}: 


і т? 
f(2h+r) eRe RUAN) + ле (reg a 


(hen 
fh 4r) = nom pi-i мавт +e (++ г ) 


3 won 
— t eL e uL Р 
26 (ies) (0<r<h) 


200 Some problems in the theory of provisioning and of dams 
It is interesting to note that these formulae for f(jh +r) (j = 1,2, ...,b) can also be obtained 
from the limit of equation (46), when A— 0, 


SGh+r) = lim B4, 4 A3 
A>0 R | 
d lim В.А pa¬ X, aUos A7)A]. (54) 
4-0 N=0 


Now for j = 1, we have 
für) = lim Py, pA! 
A>0 


R 
= Him æa py LA pat У BAA- pip, 
М= 


A>0 
=e" f(h—0)— пе) | om f(n) dn Py, (55) 


since Py A15. f(n), where NA->n for all N +0, but there is a concentration of probability 
Pats = 0. Forj = 2, 3, ..., b the equation (54) gives 


Хб) = ef jh—0)— nom [erm f(j=1)h +n) dn. (56) 


The formulae (53) can be obtained by putting the formula (50) for f(r) in (55) and inte- 
grating it to find f(h +r), then repeating this process for f(jh +r) by using (56). More gener- 
ally, it can be proved by induction that our solutions (50) and (51) satisfy the integral 
equations (55) and (56). È 

Tt is found that (52) does not lend itself to the computation of P, as well as an alternative 
equation for the integral of the probabilities, which we now obtain. To do во, we write the 
probability density f(s) for the various intervals in the following manner: 


f(s) = ue“P, (0<s<h); 


j LE v=; w v 
f(s) = ш 41 1 MX (—1y evel n с "л У Tt (s I |) 


(jh<8<(j+1)h; j = 1,2, ...;). 
The equation (52) can then be written as 


dh+u b + —4hyA —4hy 

һ(1+[ he ds у, cay кена BTE a p Y) = 1, 
0 e s=jh (j— 1)! jl 

which shortens considerably the work involved in the computation of Py 


Iam greatly indebted to Prof. P. A. P. Moran for suggesting several of the problems, and 
for his extremely helpful discussion and criticism of the work throughout all stages. 


REFERENCES 
FELLER, W, (1950). An Introduction to Probability Theory and its Applications. New York: John 
Frazer, В. A., Duncan, W. J. & Corzan, A. R. (1947). Elementary Matrices, Cambridge University 
Мовлх, P. А.Р, (1954). А probability theory of dams and storage systems. Aust. J. Appl. Sci. 


5, no. 2, pp. 116-24, 


Moray, P. A. P. ( 1955). A probability theory of dams and Storage systems. II (to appear). 


Prrr, Н. R. (1946). А theorem on random functions with applications to a theory of provisioning. 
J. Lond. Math. Soc. 21, 16-22, 


[ 201 ] 


APPROXIMATE CONFIDENCE INTERVALS 
III. A BIAS CORRECTION 


Bv M. S. BARTLETT 
University of Manchester 


In my second paper on approximate confidence intervals (Bartlett, 1953, to be referred to 
as П), where a general method of eliminating ‘nuisance parameters’ was discussed, it was 
claimed (see top of p. 310) that the effect of substituting maximum-likelihood estimates 
for the nuisance parameters could be neglected to the first order (O( 1/ Jn)) of approximation. 
I regret to say that this assertion is not in general correct, and the statistic Т, obtained after 
such substitution may have a bias of O(1/ n) which requires correction. It is even necessary 
in the ex ion > 

E т, = T+ (0,0965 MO, 03) ag + п) 
to make allowance not only for the second term, but also for the third term, on the right-hand 
side of (1), or to use the equivalent approximation to this order given in equation (14) of II. 
For convenience we may write (cf. equation (11) of IT) 


р oL ôL 
where a= 1/41; s b= — Һ»/(1 V1.2). Then, noting that 27/20, is O(1), and 32T 00$ O(4n), 
we find, omitting any terms of smaller order of magnitude than 1/ Jn, 


1 ðL eL eL La ƏL ob 
TT = 1.59, ^ 20,00, *” 20% * 00,00, 00,20 


1 әгл „| 9L GL .. a . 9 
a os. [ая + | 2,9: (3) 


Hence, making use of relations such as 
\ s eL | | eL) ls 4) 


30,00,00,) ^ (00,008) 00,” 


we obtain from (3) Ят Ar 
ELA [^ Boe E 
щт) - | ** 0 


| mla] oG) 
-iaa +0(2). (6) 
When Л» = 0, this reduces to 


E(T)- - Von | an) (6) 


| For reference, the generalizations of (3) and (5) in the case of several nuisance parameters 
‚ 6, (6 2,....т) are 
2 Lf, OL L LIA д1 =] 
=T m Jik — | A р 4 کچ‎ 
Ay [4 70,00, * 820,00, * 00,00, 00,90, 


NT FL | eL \-2 24 gr 23 : 
+H pre [asa +88 60,60,00,| F0, 3*00, |” Ga 


202 Approximate confidence intervals 


where _ ,oL oL 
T=A 96, + В, 20, R 
say, and Ж Е 
eL kh aa [ X EA kh “с 
E(T} ~ – Mr.) ea e| B Е| aa Ге erp. (2 
the summation convention being understood for g, h, i,j, k (2, ..., m), and I being the inverse 
of I. 


Provided T, is corrected for its bias, it is found further from the formulae (3) and (3a) 
that its second and third moments are then the same as those of T to O(1/ 4n). 

Examples. (i) and (ii). The two examples discussed in $$ 6 and 7 of II are not affected by 
this correction. In the analysis of variance problem of $ 6, 7,, +0, but it will be found that 
Е(Т,) ~ 0 from (5) above. (I have learnt that more direct investigations of the asymptotic 
confidence interval for this problem, using the methods first developed by Dr B. L. Welch 
inthe Behrens-Fisher problem, have recently* been made by J. В. Green and by A. Huitson; 
and, in partieular, Dr Huitson has checked the consistency of my solution with his own 
results.) 

Similarly, no bias correction arises in the time-series problem of $7, though there is 
unfortunately a further misstatement in equation (23), which should be corrected. The 
maximum-likelihood estimate of тї = 2, say, should have been given as 


а = X QX,- AX, 1). (7) 
The confidence interval from (24) then becomes 
SO Lo ET а 
уйа +1+03) (8) 


(cf. my contribution to the discussion at the Royal Statistical Society Symposium on 
Interval Estimation, 12 May 1954). 

(iii) To illustrate the use of the bias correction where it is not zero, consider the last 
example above when the mean value m of X is not zero and requires estimation. The log 
likelihood function (apart from end-corrections of relative order 1 [n) is now 


АУ: _1$ ye 
L= 3 loga 55, Y2/a, (9) 
where Y, = (X,- m) - AX, ,—m). (10) 
The L derivatives with respect to æ and В are as before, with £ = X,—m in place of X,. Also 
eL (1-8) а 


Am uL E Т] 


fh 5 X,/n, 
rel 
ы» = (1—f)P nla, In = 0, Lom =0, 
aL EDITT 
pom? = 2n(1— f)Ja = Е fmi 
È (Xp— AX) X7 à 
Eam Dal 
s inj — /%)} ў 


* See references. 


Hence the quantity 
(11) 


M. 8. BARTLETT 203 
where X? = X, — fh, and & is like @ in (7) but with X, in place of X,, has bias 
1 


KT} ~ – aul,’ 
and the equation for / becomes (>A ie 
41-25 _ B=!) _ 
T+ Ar ДО) i^ (12) 
or p= As a - y 19 оба) (13) 


in place of (8), where now 
^ n LJ 
p= È хх, [ È ax. 
rel rel 
(iv) Finally, in view of the above amendments, it seemed advisable to check these results 
to O(1/4n) by means of a fairly complicated example for which the result is already known. 
A useful case is that of the classical correlation coefficient p, with the two standard devia- 
tions æ and £ as nuisance parameters. We take, apart from an additive constant, 
i cocto [M ADAM Al 
= —nlog {8(1 — p°} |5 aB pl (14) 
where 7, 8, з, are the usual sample estimates of p, a and £ respectively (the means for con- 
venience are already eliminated). Then (cf. M. G. Kendall’s Advanced Theory of Statistics, 
vol. 2, ex. 17.18) 


g = CA) ,2-P") ү n0 p) 

«agp ^""püu-py E hi 
he" Lou de 

ав ра)" "^ aü-py ^" fü-p) 

Tt is found further from (14) that 

GL) 2n(5—2p* eL —3np? 

(aa аса: Haa] дт 

& _ = 2пр(1—0?) x GL | _ то(1+0?) (16) 
байр) A- ' lapp aß — р?) 


| BL | _ 2n(1+p?) віз i —4np(3 + p?) 
ap a(1—9*)*" op? (1—02) ° 
the expressions for the remaining quantities with « and f interchanged being obvious from 


symmetry. 
The statistic T uncorrelated with д1]дж and 219 is 
„00.000, № 0) - 
1= (51 5538 * 059 99)] "rn an 


where 15,55 = n|(1—p3?. From the above formulae, and the general expression for 


oL oL oL oL Pd чы eme d 
ME er E t f II, it is found (afte 
E A in terms of |0820) and derivatives of J, given in L, 1 18 toun (after 
some algebra) that the skewness of T in (17) is zero to O(1/ 4n). Р 

We now replace 7 by T,, the maximum-likelihood estimates of a and В (p known) being 


is ig) E m 


204 Approximate confidence intervals 
T in (17) then reduces to 


_ 4n [L — 4n(r—p), 
reins), = 1—pr ’ ks 


the functional form for r in (19) should be noted. From the general bias formula (5a), we 
find (noting that 1% = 1(2— p?) a? |n, 1°# = }p*aB/n) 


E(T,) = p] n. (20) 
Hence our confidence interval is given by 


yn(r—p)_ p TEE 
l—or- 94m “уп” 


(21) 
which agrees with the known results 


E(z)~E+4pln, o(z)~1/n, у(2)~ 0, 
EE Y 1+р 

where z= Нор, т; Pu 

While the above method is not of course required for this example, it is of some interest that 
the result (21) has been obtained merely by straightforward differentiation of the log likeli- 
hood function (14). It is, however, apparent that the algebra involved in getting the next 
term in the expansion becomes in general so intractable that a more direct attack on in- 
dividual problems is then usually the more promising. 


REFERENCES 


BARTLETT, M. 8. (1953). Approximate confidence intervals. II. More than one unknown parameter. 
Biometrika, 40, 306. 


GREEN, J. В. (1954). A confidence interval for variance components. Ann. Math. Statist. 25, 671. 
HurrsoN, A. (1954). Ph.D. Thesis, University of Leeds. 


[ 205 ] 


THE THEORY OF CORRELATION BETWEEN TWO CONTINUOUS 
VARIABLES WHEN ONE I8 DICHOTOMIZED 


By ROBERT F. TATE 
University of Washington} 


1. INTRODUCTION AND SUMMARY 


The problem of biserial correlation arises when one is sampling from a bivariate normal 
population in which one of the variables has been dichotomized, giving rise to only two 
observable values, say 0 and 1, and one wishes to use this dichotomized sample to estimate, 
or to test hypotheses concerning, the correlation coefficient р of the original bivariate 
normal distribution. The problem of biserial correlation oceurs frequently in psychological 
work, especially in test construction and validation. 

The term biserial correlation was introduced by Karl Pearson (1909), who was the first 
to perceive the statistical importance of this particular type of problem. He proposed as 
an estimator the sample biserial correlation coefficient. The asymptotic variance of this 
estimator was derived by Soper (1913). Much literature exists on the subject of how best 
to compute Pearson’s coefficient. In this connexion the reader should see Du Bois (1942), 
Dunlap (1936) and Royer (1941). 

Prof. Harold Hotelling realized some years ago that the existing methods for dealing 
with the problem of biserial correlation were far from satisfactory, and suggested to the 
author that the whole situation be reconsidered. The results of this examination are con- 
tained in the present paper. 

§ 2 contains a list of most of the notation which has been adopted, and $ 3 deals with the 
mathematical model. In $4 the question of maximum likelihood is treated. Asymptotic 
variances are derived for the estimators ô and р. The asymptotic variance for f is compared 
with the approximate expression arrived at by Maritz (1953) when he considered а some- 
what restricted model. Both expressions are shown to achieve their minimum value at 
w = 0 when p is fixed. 

Matters concerning asymptotic normality and asymptotic efficiency are also considered. 

An appraisal of r*, the sample biserial correlation coefficient, is given in detail in $5. 
Tt is shown to have asymptotic efficiency for estimating p which is 1 when р = 0, but which 
approaches 0 when | p | approaches 1. The well-known fact that r* may be greater than 1 is 
pointed out and some notion of the magnitude of 7* is obtained by a consideration of the 
product-moment correlation coefficient r. Asymptotic normality of r* is verified by the use 
of a theorem of Cramér. The asymptotic standard deviation is tabulated at the end of the 
paper (Table 2). A proof is given for the customarily assumed fact that the asymptotic 
variance has a minimum for fixed p when o = 0. For the case w = Oan approximate variance 
stabilizing transformation is derived. Caleulations pertaining to this transformation may 


+ Part of this research was done under an Office of Naval Research contract at the Institute of 
Statistics, University of North Carolina. The balance was sponsored by the Office of Naval Research on 
the Navy Theoretical Statistics Project at the Laboratory of Statistical Research, University of 
Washington. 


206: Correlation between two continuous variables when one is dichotomized 


be carried out by using a table (see Fisher, 1946, Table V B) for the function tanh-! r, This 
result should prove useful in many situations. 

§ 6is devoted to a discussion of an iterative method of solution for the likelihood equations, 
The method is essentially Newton's method for two variables, the calculated values o*, 
r* being used to start the iteration. The computations are not really prohibitive, considering 
the importance of the problem, and are to a certain extent organizable for punched-cards 
methods. An example is given with all of the calculations illustrated. Values of ¢(z), the 
reciprocal of Mills’s ratio, are required for the solution of the likelihood equations. These 
may be obtained from the tables published as a separate contribution, immediately following 
the present paper. 

Two matters of some importance which are not considered in the present paper are: 


(1) An investigation of the bias of f. 
(2) The numerical tabulation of the asymptotic variances of ô and f. 


Further study is indicated on at least the second point. 


2. NOTATION 


To eliminate the distraction of searching through the text, we shall list here most of the 
symbols and notational devices used: 


yxy) = 34835 exp I: (a? — spe y?) , the bivariate normal density. 


Xz) = Mm €-**, the normal density.T 
+o 
pe) =f Ama ае) = 1-92 
Go = хә, the reciprocal of Mills's ratio. 


+0 
Eo) = [^ Wenay. 
neo) =|" ies). 


X the undichotomized normal random variable. 

F the dichotomized normal random variable. 

w the point of dichotomy of Y, measured in standard units. 

Z the discrete random variable induced by the dichotomization of Y. 


/(&,2) the joint density of the random variables X апа Z. 

the sample biserial correlation coefficient. 

AV(r*) the asymptotic variance of r*. 

AEff(r*) the asymptotic efficiency of r* for estimating p. 

(и, с?) anormal random variable with mean y and variance о?. 


T [Editorial Note. To bring this notation into conformity with that of the tables printed on pp. 217- 
221 below and with that used in the recently published Biometrika Tables for Statisticians, vol. 1, itis 


necessary to write Z = A(x), Q = p(x), Р = 1—p(z), so that ф(т) = 7/0 or Z/P according as z is 
>0 ог <0.) 


ЕС 


ROBERT F. TATE 207 


3. MATHEMATICAL MODEL 
Let (X, Y) have the bivariate normal distributon y{(x—4)/7, (y — »)/7}. Let Z be a dicho- 
tomy of Y, with the point of dichotomy w measured in standard units. Without losing any 
generality we may set v = 0and 7 = 1. Zisthusa random variable which takes the value 1 
when Y >ш and the value 0 when Y <w. Obviously, 


P=) = f Ady = plo), P(2=0) = qo). (1) 


Consider а sample of n independent random vectors (X4, 21), (Xs, Z4), «+> (Xa Za). The 
problem of biserial correlation consists of finding a suitable function of (X ,, Z,) (i = 1,2, ..., n) 
with which to estimate p. 
Karl Pearson (1909) introduced the estimator r* (*biserial r’), which we express in the 
following form: i 
калач *3(x,-X) (Z,-2) Е xu, -2y| 


а А 


т* = 


Н (X= xj NT) ч 


where r is the product-moment correlation coefficient of (X;, Z;), and T is the solution of 


hi i © 
the equation} E уву = Z. By 


r* will be discussed completely in § 5. For the present we shall merely state the asymptotic 
variance obtained by Soper (1913): 


w? (2р—1 5 
av0) = ie P р i LE А (3-4) 


where the functions р, q and A all have argument w. ./{Av(r*)} is given in Table 2 at the end 
of the paper. In view of symmetry about the values p = 0 and p(w) = }, the tabulation is 
given for р = 0 to 1 in steps of 0-10, and for р = 0-05 to 0-50 in steps of 0-05. 

Since the random variable Z takes the value 0 or 1, the joint density of (Х, 2) can be 


ы /@,г) = 061) + 1297080), (85) 
"n дао f rents. feme [vem (36) 


with W(z,y) denoting the bivariate normal density, у{(2 — Ш)/0, y), with means y and 0, 
variances т? and 1, and correlation p. §§ 4 and 6 are devoted to a discussion of the likelihood 


function Hf (x; z). 


4. PROPERTIES OF THE MAXIMUM-LIKELIHOOD ESTIMATORS 


As the likelihood function stands it may be expressed as 
— X. — 
Ln, 0°, 0,p) = по) a^ ; oJ. (41) 


Maritz (1953) considered the restricted model with д = 0, c? = 1. Using biserial data 
(X4, Z) G = 1,2, ...,n), he introduces a grouping of the X observations, and then considers 
All X and II symbols with index û will have limits 1 to n. , f 
: 1 The reason fox this definition of T will be apparent in $5, when we show that r* is consistent and 
asymptotically normal. 


208 Correlation between two continuous variables when one is dichotomized 


the observations to be concentrated at their respective cell mid-points. This leads to a neat 
solution of the problem by probit-analysis methods. A proof of the convergence of this 
method as the grouping becomes finer must depend on a close examination of the limiting 
processes involved. Specifically, it is necessary that as the cell width becomes small, and 
the sample size large, each cell must contain sufficiently many observations that the ratio 
. of the number of X observations whose corresponding Z observations are 1 to the number 
of X observations provides a valid approximation to the conditional probability that Z = 1. 
Instead of attempting a discussion of this point, we shall derive the asymptotic variances 
for the four-parameter problem (and, as a by-product, for the two-parameter problem), 
and in $ 6 discuss an iterative method for obtaining 0 and 2. This method, while more time- 
consuming than that of Maritz, does not require grouping. It should be noted in this con- 
nexion that Tocher’s exact method (see Tocher, 1949, pp. 9-11), also known as the ‘scoring’ 
method, does not help in this case, owing to the difficulty of obtaining expected second 
partial derivatives of L. 

The likelihood situation of Maritz differs from ours because of the fact that when и and 
с? are set equal to 0 and 1 respectively, the customary method for obtaining asymptotic 
variances of @ and f by an inversion of the information matrix leads to smaller variances. 
As far as the solution for @ and f is concerned, the results will remain the same after a slight 
transformation. f 

It may be remarked without dwelling at any length on regularity conditions that those 
given by Cramér (1946, p. 500) may be easily verified, since f(x, 0) and f(x, 1) are both in- 
tegrals of bivariate normal densities. Consequently, 2, @*, 0 and f will be asymptotically 
normal and asymptotically efficient estimators of the corresponding parameters. We now 
use the information matrix technique to find the asymptotic variances of @ and f. 


THEOREM I. The asymptotic variances of Ô and р are 


[рой 


pnm t Ame ar аа за ا‎ АР S Pott + 2) 
To +o +o 2 Y 
“ере ей C 
+0 d 
ауу 220 fis | pa py 


Tre pre 7 prea + Nd 
EOAR 

æ, 0, p) = A(x s Et cd ) (E). 
aes op) =2 T=) N Ta gt 

Proof. Using expression (4-1) for L, and letting à? refer to any of the ten-second order 
partial operators, we obtain the fundamental relation 


where 


E(8* log L) = ngB,(6? log n) + npH, (8? log £), (42) 


where E, and Е; mean conditional expectation with respect to the conditional densities of 
X given Y <w and Y > о respectively. 


f The author expresses his indebtedness to the referee for this fact, and for its proof which will be 
given later, 


ROBERT F. Tare 209 


M a of the possible operators д the calculation of (4-2) proceeds in about the same 
way. We compute, as an illustration, 
e). 


It may be shown that 
V(z| У<о) = vC. v), ¥(z| Y >w) = se. v). 
After some differentiation we get 
?* log y toe Рх) 
ЧЕ) [rios 
a: н: -o (rz — u)jo, v) 


mE 255) =" "Ше = 


Combining the terms of (4-3) according to (4-2), making use of the relation 
+o 1 1 +o д 
[ео чаки TRA dz = IN TS 
(=F) (5) 


and performing similar computations for the expectations of the other second partials of 
log L, we arrive at the information matrix} 


(43) 


D e d Sat E ST r beL 
=p =F o(1—p?) o(1—p*) 

_—2роау+рш%ъ — a,p—p*oay —— asp—p*oa, 

(1—69)? ol- oi- 
‚ L-pttptay __ Pm , 
nore TAA 7941-05) 
X1 p!) plas 
P-e) 
kc VR f ада, o, p) da. (4-4) 


The asymptotic variances of Ô and 2, obtained after inversion, correspond with the expres- 
sions of the theorem. y 

The two-parameter problem of Maritz, solved by considering the upper left 2 x 2 sub- 
matrix, yields asymptotic variances which are the leading terms of the expressions given 
in the theorem; they coincide with his results. 

The role played by о in A V(9) is partially described by the following theorem: 

THEOREM П. AV(0) is a minimum for each p when w = 0. 

Proof. For the case p — 0 the proof follows from Theorem IV of the next section. For the 
case p+ 0, make the transformation (0 — pz) (1 — —p)-3 = y in the integrand of (4-4). AV (f) 
can then be expressed as 4 pu p 


where b, = E'[X*o(X) 9(— X)], with Ё' denoting expectation with respect to the distribu- 
tion of W[w(1 — р2)-%, p*(1 —p2)-1]. It can now be seen from considerations of symmetry 


+ Columns (from left to right) and rows (from top to bottom) correspond to w, p, jt, т. 
14 Ы Biom. 42 


210 Correlation between two continuous variables when one is dichotomized 


that b and b, have maxima at w = 0, while b, has its minimum at w = 0, which completes 
the proof. j 


5. THE UTILITY OF r* 


We now present a series of results concerning r*, which will be followed by a general dis- 
cussion of its value. Note that expression (3-2) for r* is invariant under a linear transforma- 
tion of the X,, so in all results pertaining only to r*, we set w = 0 and о? = 1, 

A little later we shall need E(XZ). Since it is not difficult to obtain, we will givethe 
expression for the general moment o, = E(X*Z). 


k +o 
I ia [rra 
j=0 w 
where a; is the j-th moment of the random variable M (0, 1). 
Proof. Using the definition of E, we obtain 


To + 
EQUZ) = pE x9) = [7 [^e (dyar. 


Make the transformation t = (&— py)4/(1 — p?). The above then reduces to 
| [ [aom ene d ewaya 
б тте н 


Using a binomial expansion and integrating with respect to t, we obtain Theorem III. The 
integrals which occur are incomplete gamma functions which may be evaluatéd by the 
usual recursion relation. 

For completeness we include the relation between p(X, Y) and p(X, Z), known to Karl 
Pearson: Aw) 


NN 
Tt follows from the original definition of biserial correlation, as given by Pearson, that 7* 


is consistent. This fact is also an immediate consequence of relation (3-2) between r* and r: 
7 p(X, Z) in probability as n> co. Thus 


P(X, Z) = p(X, Y) 


FAL zal? 
"=T Е X, -Zy 2 Ө ад in probability, 
and hence by the above, r* -> p(X, Y) in probability. 

With respect to the magnitude of r*, it is well known that |r*| can be greater than 1. 
Something of the nature of this phenomenon can be understood by looking at r. In order 
to prove a result concerning the magnitude of r*, we shall need a preliminary result (вору 
Tate, 1953, Lemma 2): 


THEOREM IV. P(x) q(x) > }лтА%(ж), (—о<х< + oo), with equality at 0, + со. 
Now we have 


THEOREM V. 
Е Z> (47). 


_ AZ- Z3) 
mor T) ai 


Proof. Rewriting (3:2) as 


ROBERT F. TATE 211 


we have, in view of the definition of 7’, 
po AUT) atr) 
AU). — 
Theorem IV applies for any T', so Theorem V is proved. As a consequence of Theorem V, 


we see that 
$> +1 
r 


according as r> + s 
« —1 л 


КЕ 
<- j= 
п 


Asymptotic normality of r*, which will be needed later in this section, is a consequence 
of a theorem of Cramér. 


THEOREM VI. r*¥~N{p, AV(r*)}. 


Proof. In expression (3:2) the term A(T) is seen to be an infinitely differentiable function 
of Z. Thus, r*is a totally differentiable function of the sample means X,Z, X2, XZ. Applying 
Cramér's theorem (see Cramér, 1946, p. 366), we have asymptotie normality with the 
asymptotic variance (3:4) calculated by Soper. 

We shall now present two results which are more important than those just preceding. 
They concern the asymptotic, or large-sample, efficiency of r*, with respect to the class of 
estimators of p based on the sample (Xi Zi). 

ТнкоңЕм VII. r* is an asymptotically most efficient estimator of p when p = 0. 

Proin view of Theorem VI on asymptotic normality, we have a right to inquire about 
the asymptotic efficiency of r*, which will be denoted by 

AV(/) 
cea га 
AEff(r*) = IVES 
Tt may be seen from Theorem I that 
AV(f |0,0) = 


plu) qo) 
TOA e 
Now, from (3:4) we observe that (5-1) coincides with AV (r* | о, 0). The conclusion follows 
from the definition of an asymptotically most efficient estimator. 


THEOREM VIII. r* is an asymptotically least efficient estimator of p when | p | 1. 
Proof. An application of Theorem IV shows that 


(9) — pa: (- wn Pe «2, 
(soa) (rp) < 
“ence, recalling the definition of g(x, w, p) in Theorem T, we see that all integrals of the form 


Í O w, p) dx exist. Schwarz’s inequality shows that AV(f |o, p) is such that the term 
-0 

2 
in braces is non-vanishing. Thus, A V(f | ©,р)->0 ав |p | >1. From the fact AV (r* | v.p) pues 


as |p| 1, we conclude that 4 Eff(r* | ©, p) 0. 
The special case w = 0 has interesting features which will appear in Theorems X and XI. 


First we shall need another preliminary result (see Tate, 1953, Lemma 1): 
THEOREM IX. {1 —2p(x)} A(x) – ep(r)q(z)2 9, (x29) 


14-2 


212 Correlation between two continuous variables when one is dichotomized 


THEOREM X. The asymptotic variance of r* has its minimum for each p at о = 0. 
Proof. In view of symmetry, it will be sufficient to show the result for о > 0. Let 


p(w) (о) _ (1—2p(v)) _ 2(®)4(®) 
А(о) = Aw =p A(o) , В(о) = Ao > 


gw) = (1— 2p(v)) A(») – ор(®)д(®), hw) = p(w) glo) – M{A()}*/2. 


From this point until the end of the proof, we shall omit ù whenever it appears as an argu- 
ment of any function. We have (Tate, 1953) 


g не 20? — pq, 9" = — 40d? + (2 — 1)A, 9(0) = (со) = 0, 9'(0) > 0, 
М = A(1—2q)+70A2, h" = А (п —2—2mo?) —w(1 — 29) А, 
0) = оо) = (0) = 0, h"(0) > 0. 


Accordingly, we have 
4-2—ogA?, B=hA*+}2, with A<0, Bim, 

both equalities holding at w = 0. The relation AV (r* | v, p) > AV(r* | 0, p) for all p may be 
written p2A + B> іт for all p. Since A <0 this last expression is implied by А+ В> ёт, 
which in turn is equivalent to h > wg. Thus, we must show k = h— wg > 0: 

k' = 209q(1—4)— 2(29 — 1) А + o(1 — 2) А2, 

k" = 29(1—q) — A*(6 —7 + * (20 — 4)}, (5:2) 

k(0) = k(o0) = k'(0) = 0, &"(0) = 1—3/m» 0. 
We shall show that there exists no y such that Ё'(у) = 0, (у) < 0. Suppose such a у does 
oe 2(2q—1)A = 2yq(1—q) + (7 — 2), 
q(1—q) (1+0) - ko? — y(2q— 1) A< i 
Substituting the right member of the first expression into the second, we have 
2q(1— q) < Ап + (т — 2) y^. 


Thus, Ё”(у) < ۸%27 — 6 — (т — 2) y?). A negative maximum must, from (5-2), be followed by 
a negative minimum. Hence, from the above relation in Ё” (у), there exist no extrema which 
exceed ((27 — 6)/(z — 2)}+. Assuming there is a negative extremum of k, then there must be 
а negative minimum in (0, 1). Let у be this minimum point. Then k"(y) > 0, or from (5:2), 
24(1 —q) — A*(6— 7 — (27 — 4) y?) > 0. Substituting the value of 2q(1—q) obtained from the 
first equation in (5:3), we reach (2q — 1) — yA(2 + (лт — 2) y?) > 0. The left member vanishes 
aty = 0 and has a negative derivative for 0 < y? < 1. Therefore, there is no negative minimum 
in (0, 1), and from the previous argument k > 0, which completes the proof. 

Since for any fixed p, r* is a better estimator when w = 0, it will be useful to have for this 
case something simpler in the way of an asymptotic distribution of r* than that contained 
in Theorem VI. We are therefore led to 


(5:3) 


THEOREM XI. When w = 0, we have to a close approximation 


2r* 2p 5 
EX. o d 
tanh J 4 (tanh 8’ z) 


ROBERT F. TATE 213 


25 — 8r 


Proof. 
"t дуро) = 1os- pee = t-o- (Pos): 


Dropping the last term and solving the equation 
: 1 
Wie) qaa 
we have g(a) = (2/45) tanh- (2z/,/5). It is known that 


vn(gir*) – g(p)) ^ (0, 1), 
во the theorem is proved. 


Discussion of results concerning r* 


In looking over Theorems V, VI, VII, VIII, X and XI, several facts stand out. First, even 
though r* is consistent and asymptotically normal, it is still inadequate for estimating p 
because of its possible magnitude and its lack of large-sample efficiency for large values of 
|p|. In the case of testing the hypothesis H:p = p, the first defect is not of so much con- 
sequence. Even in a problem of estimation, one can always operate under the rule: when 
|r* | <1, estimate p by r*; when r* > 1, estimate p = 1; and when r* « — 1, estimate p = — 1. 
The gross defect is lack of efficiency. In practically all applications it is of more interest to 
detect large values of p than small values. In just such cases r* is a ‘worst’ estimator. On 
the other hand, again speaking in large-sample terms, when p = 0, r* is a ‘best’ estimator. 
Hence, if we base a test of H:p = ро on 7*, good results should be achieved when | ро | is 
small. Itis then recommended that r* be used for one and only one purpose, to test Н: p = ро 
when |p| is small. If, in addition, the assumption @ = 0 is tenable, then the variance 
stabilizing transformation of Theorem XI may be used, calculations being performed with 
Table VB of Fisher (1946, p. 210). In such a case certain advantages (see Fisher 1946, 
pp. 197-204) will accrue. {mAV (r*)} is given in Table 2. Note that from "Theorem X, 


w = 0 is desirable on the grounds of precision. 


6. SOLUTION OF THE LIKELIHOOD EQUATIONS 
Using the notation of (41), we have the likelihood equations 


(25, °) ns. 3 = 0. (6-1) 


te, p. 208), the solution of the four-parameter 
problem reduces to the problem of determining @ and f. It turns out that the likelihood 
equations for гапа w may be combined to yield û = 7. Similarly, a combination of likelihood 
equations for c? and p will give us $2 = s. Details will be omitted. 

We now replace и and с by € and s in the expression («;—/)/0, which occurs in (6-1), 
and denote the result by 2;. Also, let I/ denote the new likelihood function. The solution of 


8L' = 0 for 6 and f follows. 


1 Another point in favour of r* is indicated by the parallelism of Theorems II and X. 


214 Correlation between two continuous variables when one is dichotomized 
One may easily verify that 


ED L Viu), eve | a 
ie) _ -i-p YC) Elepo) (zipu) Wai) | 
=p) ^ dw» 1-09 


By the use of (6-2) the equations dL’ = 0 can be written as 


Siri - po) z- 6s - n (TP) E «| 3 
| (6-3) 


X(2;,— n [es.- (rE) = 0. 
Now introduce the notation 


= 2-1, y= (®—рд)(1—р%)у4, ф, = ан (6-4) 


A; = фф, у). 
Rewriting (6-3) again, in the new notation, we have 
Eô =0, Ld, = 0. (6-5) 


Easy differentiation gives $'(z) = ф(ж){ф(ж) —z). Newton’s method in two variables gives 
the following equations in Aw and Ap, where Ao = w—w,, Ap = 0= рі, €, and p, being 
initial guesses: 


А; 9191234; — XA,2, a 
(zat) vis (зде) Mog e 
Asa ZA,2; ZA, Н 
(лт) ао (аа ücgp A 


Let A be the determinant of the coefficients. Themethod of solution will then be the following: 
(i) Compute o*, r* from the sample (шу, z;) (i = 1, 2, ..., n), where r* is the sample biserial 
correlation coefficient and w* is the solution of the equation p(w) = 2. Now, let w, = w* and 
r* when |r*|<1, 
Рі = 4 +090 when  r* >1, 


(6-6) 


— 0:90 when  r* <—], 

(ii) Compute б, у;, Pi 9:94, Ori, А, Aoi, Aix? for i = 1, PAS 
are defined in (6:4), and the numerical values ofthe ф; = 
printed on pp. 217-221 below. Note that these tables m 
ф = ZIQ if X>0and Z/P if X « 0. 

(iii) Evaluate the three determinants 


++, where б,, y; фу, Aya 
(б.у) are obtained from the tables 
ust be entered with X = ôy; while 


m ZA, P10, ZA,— XA az 1 
ZA ро, ХАУА, [pps 
A -| -Zô ро, ХА, ХА; 1 
— 26,256; py ZA; EA? [AQ PI 


Ap= EA, — Xó,, | 1 
УА; -Eth A(1— pi) 


| 


ROBERT Е. Тате 215 


(iv) Obtain (w, p) from (Aw, Ap) and (ш, ру), and repeat the process using w = wg, p = ру 
in place of w, and p. 

The rule given in (i) is somewhat arbitrary, but is believed to be a good rule of thumb. 
The longest stage in the scheme outlined above is the determination of ¢,, i = 1,2, ..., n, 
from the tables. 

We shall now present an illustration of the method. In order to have a good vantage point 
for observing the way the calculations run, we select a random sample from a bivariate 
normal population with zero means, unit variances and correlation 1/,/2. A table of random 
numbers from such a population is not available directly}, but can be constructed from a 
table of random numbers from (0, 1) as follows: Let U = ¥ (0,1), V = M (0,1), o ф, 
with U and V independent. Now let X = U and Y = (U + V)/ 42. After introducing the 


dichotomy Z=1 if Y>}, Z=0if Y<}, (6-7) 


we have the desired set-up. It is important to note that for this particular example, the 
x; of the computing scheme become merely z;, since it is known that и = 0 and g? = 1. For 
computations see Table 1 below. т 

A second iteration resulted іп w, = 0-251, p, = 0-489. Since f remained unchanged in 
the third decimal, the results were not included. Recall that the true value of is 0-707. 
On the basis of our sample of 20, = 0:489 is the best we can do. However, by using the 
iterative scheme instead of r* we removed 27 % of the error. 

Although r* is used merely to start the iteration, the fact that 


|r*—E(r*) [| (AV(r*)) = 1-64 
serves to indicate that we could have been more fortunate in our selection of a sample. 


REFERENCES 


CnAMÉR, Н. (1946). Mathematical Methods of Statistics. Princeton: Princeton University Press. 

Dv Bors, Р. Н. (1942). A note on the computation of biserial r in item validation. Psychometrika, 
7, 143. 

Duna, J. W. (1936). A nomograph for computing biserial r. Psychometrika, 1, 59. 

Fisuer, В. A. (1946). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. 
Manrrz, J. S. (1953). Estimation of the correlation coefficient of a bivariate normal population where 


one of the variables is dichotomized. Psychometrika, 18, 97. 


Pearson, К. (1909). On а new method for determining the correlation between a measured character 


А and a character B. Biometrika, 7, 96. ET S ; | 
Royer, E. B. (1941). Punched cards methods for determining biserial coefficients. Psychometrika, 


6, 55. 
Soper, Н. E. (1913). On the probable error 


Biometrika, 10, 384. 
} uality of the normal distribution. Ann. Math. Statist. 24, 133. 


TATE, R. Е. (1953). On a double ineq! t и 1 
Тоснкк, К. D. (1949). А note on the analysis of grouped probit data. Biometrika, 36, 9. 


for the biserial expression for the correlation coefficient. 


+ [A table of Correlated Random Normal Deviates is now at press and will be issued shortly from the 


Department of Statistics, University College, London as Tracts for Computers No. xxvi, Ep.] 


216 Correlation between two continuous variables when one is dichotomized 
Table 1. Computation of example 


Ty zi Yi ô; Ф, Or, [A А; Ах 
0-24 0 0-030 — 0-779 0-187 0-809 0-630 0-151 
0-63 1 —0-146 0-707 0-445 0-853 0-603 0-380 
— 0-59 0 0-404 — 0-560 — 0-330 0-964 0-540 — 0-319 
1-08 0 — 0-348 — 1:032 1:115 0-684 0-706 0-762 
0-06 0 0-111 — 0-729 0-044 0:840 0-451 0-027 
— 0-01 0 0-143 — 0-709 — 0-007 0-852 0-604 — 0-006 
159 | 1 | —0-578 0-470 0-747 1:048 | 0-493 0-784 
— 0-41 0 0-323 — 0-604 — 0-248 0-927 0-560 — 0-230 
—0:33 | 1 0-287 0-989 —0-326 | 0-702 | 0-694 — 0-229 
0-99 1 —0-308 0:613 0-607 0-921 0-565 0-559 
0-30 0 0-003 — 0-796 0-239 0:799 0:636 0-191 
+207 | 0 1:070 | — | 0-262 | —0-542 1:332 | 0-349 — 0:722 
—031 | 0 0.283 | — | 0-655 | —0-138 | 0-888 | 0-582 —0-122 
—047 | 1 0-350 1:033 0-486 | 0-683 | 0-706 — 0-332 
1:28 1 — 0-438 0-541 0-692 0-979 0:530 0-678 
082 | 1 | —0-231 0-656 0-538 | 0-887 | 0:582 0-477 | 0:391 
1:06 | 1 | —0-339 0-595 0-631 | 0-934 | 0-556 0-589 | 0-624 
1:09 | 0 | —0353 | — | 1:035 1128 | 0-682 | 0-706 0-770 | 0-829 
0:57 | 0 | —0-119 | — | 0-875 0-499 | 0:756 | 0-662 0-377 | 0-215 
—0-53 | 0 0.377 | — | 1052 | —0-558 | 0-675 | 0-710 — 0:376 | 0-199 
Xx, = 5:09 Хлг, = 5-04 r* = +0:410(01) у; = 0-138— 0:450; 
La? = 15-626 e, =9 0% = +0-126 (0) 
20,9; = —1380 УХА, = 11-865 ZA,22— 8-127 А = — 122-54 
®ёаф = 0:343 XA,z, = 3-409 До = 0-108 Ap = 0:079 
©» = 0-234 ps = 0-489 


All values must be divided by yn, as the quantity tabled is {nA V (r*)} 


porl—p 


Table 2. The asymptotic standard deviation of r* (biserial т) as a function of p and p (equation (3:4)) 


[ 217 ] 


THE NORMAL PROBABILITY FUNCTION: TABLES OF CERTAIN 
AREA-ORDINATE RATIOS AND OF THEIR RECIPROCALS 


EDITORIAL 
Notation. Writing X as a standardized normal deviate, the following notation will be used :* 


1 
NT d 
Jem ' 


E. А JT 
Parne] us du, Ф=ї-Р= f ыс” 


The table then gives for X = 0-00(0-01) 3-00 the values of the four ratios P/Z, Q/Z, Z/P апа Z/Q. For 
the last three ratios, the results are extended for argument X = 3-00 (0-01) 4-00 (0-05) 5-00. The ratios 
P/Z, Q/Z and Z/Q are tabled to five decimal places and those for Z/P to five significant figures. 

Derivation. The ratio P/Z was freshly computed in the Department of Statistics, University College, 
London, using the Tables of Normal Probability Functions, National Bureau of Standards (1953). 
The ratio Q/Z has been taken directly from Table III of Karl Pearson’s Tables for Statisticians and 
Biometricians, Part II (1931). The ratios Z/P and Z/Q were calculated by Prof. Z. W. Birnbaum using 
the National Bureau of Standards Tables, and were originally included in two tables forming part of 
the preceding paper by Dr R. F. Tate (1955). With Dr Tate's and Prof. Birnbaum's approval it was, 
however, decided to give separately the more comprehensive table which follows. This will form à 
useful companion to Table II, Tables for Statisticians and Biometricians, Part II, which gives the same 
four ratios for an argument of P instead of X. It is intended to reproduce both tables in the second 
volume of Biometrika Tables for Statisticians. 

Having regard to this wider use, Dr Tate's notation has been modified, so that his ¢(x) (see p. 206 
above) becomes Z/P for «= — X «0 and Z/Q for x = X 7 0. 

Interpolation. For a large part of the table accuracy to the full number of decimal places given may 
be obtained by linear interpolation. Where this fails, the Bessel formula 


yg = (1—0) уо+ 0y, — 10(1 — 0) (8yo + yı) 


is adequate, except for P/Z with X > 2-50, when full use of second differences in Everett's formula 
becomes necessary. 


Z-Z(X)- 


The normal probability function ; extension of main table 


— 


x Q/Z ДР 219 x QIZ ДР ZIR 


4:00 0-23665 0۰013383 4۰22561 450 0-21257 0015984 4۰70432 


05 -23401 0310944 4-27330 -55 21042 0412747 415240 
10 :23143 -0189264 4-32103 60 -20831 0110141 4-80051 
15 +22890 0172627 4-36880 65 -20624 0580472 4-84865 
:20 :22642 ۰058944 441662 ۰70 -20421 0563698 4-89682 


425 0-22399 0۰0147719 446447 475 0:20222 00550295 4۰94503 


30 -22161 :0138536 4:51237 80 :20027 :0539613 499326 
35 -21928 0181042 4-56030 “85 19835 -0531122 5:04153 
40 +21700 0124943 4-60827 90 19647 0524390 5-08983 
45 -21476 0119992 4-65628 :95 :19462 :0519066 5۰13815 


0۰19281 0:0514867 5:18650 


4:50 0-21257 0-015984 410432 


used by Pearson and Hartley in Biometrika Tables for Statisticians, 


* This conforms to the notation 
1 (1954). 


218 Normal probability function 


The normal probability function. Table of area[ordinate ratios and their reciprocals 


Р/2 QjZ Z|P 216 Р/2 0/2 2]Р 210 
0-00 | 1.25331 | 1-25331 | 0-79788 | 0-79788 196402 | 0-87636 | 0-50916 | 1-14108 
-01 | 126338 | 124338 | -79153 | -80426 | 1.98399 | -87078 | -50404 | 1-14840 
02 | 127357 | 123356 | -78520 | -81066 | 2-00426 | -86525 | 2.49894 | 1.15574 
-03 | 1-28389 | 12387 | -77888 | -81708 | 2-02483 | -85977 | :49387 | 1-16310 
-04 | 129434 | 121430 | -77260 | .82352 | 204572 | -85436 | -48882 | 1-17047 
0-05 | 1-30492 | 1-20484 | 0-706633 | 0-82999 | 2-06693 | 0-84900 | 0-48381 | 1-17786 
06 | 1-31564 | 1-19550 | -76008 | -83047 | 2.08846 | -84370 | -47882 | 6 
07 | 132650 | 118627 | -75386 | -84298 2.11032 | -83845 | -47386 | 1-19268 
:08 | 1.33750 | 1-17716 | -74766 | -84950 213252 | -83326 | -46893 | 1-20011 
09 | 134804 | 1-16816 | -74149 | -85605 2-15506 | -82812 | -46402 | 120756 
0-10 | 1.35993 | 1.15926 | 0-733533 | 0-86262 | 2-17795 | 0-82303 | 0-45915 | 1-21503 
11 | L37136 | 1-15048 | -72920 | -86921 2.20120 | -81799 | -45430 | 122251 
12 | 1.38295 | 1-14179 | -72309 | -87582 2-22481 | -81301 | -44948 | 123000 
13 | 1:39468 | 113321 | -71701 | -88244 | 2.24879 | -80807 | -44468 | 123751 
14 | 140058 | 112474 | -71095 | -88910 2.27315 | -80319 | -43992 | 124504 
0-15 | 1-418602 | 1-11636 | 0-70491 | 0-89577 2.29789 | 0-79835 | 0-43518 | 1-25258 
-16 | 1-43083 | 110809 | -69889 | -90246 2.32302 | 79357 | -43047 | 1.26013 
17 | 1-44320 | 109991 | -69290 | -90917 | 234855 | 78883 | -42579 | 1-26770 
18 | 145574 | 1-09183 | -68694 | -91590 2-97449 | 78414 | -42114 | 1.27529 
19 | 1.46844 | 108384 | -68099 | -92265 | 240085 | -77949 | -41652 | 1-28289 
0-20 | 1-48132 | 1.07594 | 0-607507 | 0-92942 2.42763 | 0-77489 | 0-41192 | 1-29050 
21 | 1-49437 | 1-06814 | -66918 | -93620 2-45484 | -77034 | -40736 | 120813 
22 | 150760 | 106043 | -66331 | -94301 2-48249 | -76583 | -40282 | 130577 
-23 | 1-52101 | 1-05281 | -65746 | -94984 2.51059 | -76137 | -39831 | 131342 
24 | 153460 | 104527 | -65164 | -95669 2.53915 | -75695 | -39383 | 132109 
0-25 | 154837 | 103782 | 0-64584 | 0-96355 2.50817 | 0-75257 | 0-38938 | 1-32878 
26 | 156234 | 103046 | -64007 | -97044 2.59767 | -74824 | -38496 | 133648 
27 | 157650 | 102318 | -63432 | -97734 2.62766 | -74394 | -38057 | 1:34419 
28 | 1-59085 | 1-01599 | -62859 | -98426 265814 | -73969 | -37620 | 135191 
29 | 1-60541 | 100887 | -62290 | -99121 2-68913 | -73548 | -37187 | 135905 
0-30 | 1-62017 | 1-00184 | 0-61722 | 099817 272003 | 073131 | 036756 | 1-360740 
31 | 163513 | 0-99488 | «61157 | 1-00514 2-752606 | -72718 | 36328 | 1-37517 
“32 | 105030 | -98801 | -60595 | 1-01214 2.78523 | -72309 | -35904 | 1.38295 
33 | 1.00509 | -98121 | -60035 | 1-01916 2-81835 | -71904 | -35482 | 139074 
34 | 1-68130 | -97448 | -59478 | 1-02619 2.85202 | -71503 | .35063 | 139854 
0-35 | 169713 | 0-960783 | 058923 | 1.03324 2-88626 | 071106 | 0-34647 | 1-40636 
'86 | 1.71318 | -96126 | -58371 | 104031 2.92109 | -70712 | -34234 | 141419 
37 | 172046 | -95475 | -57821 | 1-04739 2.95651 | -70322 | -33824 | 142204 
'88 | 174598 | -94832 | -57274 | 1.05450 2.09264 | .69935 | -33416 | 142989 
'89 | 1.76273 | -94196 | -56730 | 1-061602 302918 | -69553 | -33012 | 1.43776 
0-40 | 1.77973 | 0-93567 | 0-56188 | 106876 3-06646 | 0-69173 | 0-32611 | 144504 
41 | 1-790697 | -92944 | -55649 | 107591 3-10438 | -68798 | -32212 | 145354 
42 | L81447 | .92329 | -55113 | 1-08308 314296 | ‘68425 | -31817 | 1446144 
43 | 1.83222 | .91720 | -54579 | 1-09028 3-18222 | .68057 | -31425 | 1-46936 
44 | 185023 | -91118 | -54047 | 48 3-22216 | -67691 | -31035 | 147729 
0-45 | 186850 | 0-900522 | 0-53519 | 1-10471 326280 | 0-67329 | 0-306048 | 1-48524 
46 | 1-88704 | -89932 | -52993 | 1-11195 330416 | ‘66971 | -30265 | 149319 
47 | 1-90586 | 89349 | -52470 | 1-11921 3-34624 | -66615 | -29884 | 150116 
48 | 192490 | -88772 | -51949 | 1-126048 3-38908 | -66263 | :29506 | 150914 
49 | 194434 | -88201 | -51431 | 1-13377 343208 | -65914 | -29132 | 1-51713 
0-50 | 1-96402 | 087636 | 0-50916 | 1-14108 347705 | 0-605568 | 0-28760 | 1-52514 


X 
Z(X)-eM'|J(9m, P(X) = 1-Q(X)= J Z(u) du. 


P|Z 


3.47705 


3.52222 
3-560821 
3.61502 
3۰66268 


| 3:71121 


3۰76062 


| 9:81094 


3۰86218 


| 3-91437 | 
| 3-96752 


4:02166 
4:07681 


4-13299 | 


4۰19023 


4۰24854 
4-30795 
4-36849 
4:43018 
4:49305 


455713 
4.62243 
468900 
475085 
4۰82603 


4-89655 
4-96845 
5.04176 
5:11652 
5:19276 


5:27051 
5:34980 
5:43069 
5:51319 
5۰59735 


5۰68321 
5۰77081 
5۰86019 
5۰95139 
6۰04446 


6۰13944 
6۰23638 
6.33533 
6۰43632 
6۰53942 


6۰64467 
6۰75213 
6۰86186 
6۰97389 
7۰08830 


7۰20514 


219 


QIZ | ZIP | 210 x PjZ | QjZ | ZIP | 
0-65568 | 0-28760 | 1-52514 1:50 | 720514 0-51582 | 0-13879 1:93868 
65225 | 298391 | 1:53315 51 1:32448 :51356 :13653 1:94719 
«64885 | -28025 | 1.54118 ۰52 7۰44636 -51133 -13429 1:95570 
+64549 -27662 | 1-54922 -53 | 757087 | -50911 :13208 1:96423 
-64215 -27302 | 1:55726 54 | 7۰69805 -50690 -12990 1-97276 
| 
0-63885 | 0-26945 | 1-56532 | 1-55 7-82799 | 0-50472 | 0-12775 1:98130 
"63557 -26591 | 1-57340 -56 7-96074 -50255 -12562 1:98985 
-63232 | -26240 | 1-58148 57 8.09639 -50040 12351 1-99841 
-62910 | -25892 | 1-58958 -58 8-23500 | -49826 12143 2۰00698 
-62591 :95547 | 1-59768 59 8۰37664 49614 -11938 2-01555 
0-62274 | 0-25205 | 1-60580 1:60 8.52140 | 0-49404 | 0-11735 2۰02413 
-61961 ‘24865 | 1-61392 61 8-66935 -49195 “11535 2-03272 
-61650 -24529 | 1-62206 62 8-82058 -48988 ‚11337 2۰04131 
-61342 24196 | 1-63021 63 8-97517 48782 11142 2-04992 
۰61036 -23865 | 1-63837 64 9.13320 -48578 -10949 9.05853 
0-60733 | 0-23538 | 1-64654 1-65 9.29477 | 0-48376 | 0-10759 2-006715 
-60433 -23213 | 1:65472 66 9-45996 48175 +10571 9.07578 
-60135 -22891 | 1-66292 67 9-62887 -47975 ‚10385 2-08441 
-59840 -22572 | 1-67112 68 9-80159 47777 10202 2-09305 
-59548 -22257 | 1-67933 69 | 997824 -47580 -10022 2-10170 
| 
0-59257 | 0-21944 | 1-68755 1:70 | 10-15889 | 0-47385 0-098436 | 2-11036 
-58970 -21634 | 1-69578 71 | 10-34367 -47192 -096677 | 2-11902 
-58684 -21326 | 1-70403 72 | 10:53268 -46999 094943 | 2-12769 
-58402 -21022 | 1.71228 73 | 10-72604 -46808 -093231 | 2-13637 
:58121 -20721 | 1.72054 474 | 10:92384 -46619 091543 | 2-14506 
0-57843 | 0-20422 1-72882 | 1-75 | 11-12622 0-46431 | 0-089878 | 2-15375 
-57567 -20127 | 1-73710 -76 | 11:33330 :46244 :088236 | 2-16245 
-57294 -19834 | 1-74539 -77 | 11:54520 -46058 -086616 | 2-17115 
-57022 19544 | 1-75369 -78 | 11:76205 -45874 ۰085019 | 2-17987 
-56754 -19258 | 1-76201 79 | 11:98397 -45692 -083445 | 2-18859 
0-56487 | 0-18974 1-77033 | 1:80 | 12-211 12 | 0-45510 | 0:081893 | 2- 19731 
-56222 -18692 | 1-77866 -81 | 12:44362 -45330 -080362 | 2-20605 
-55960 -18414 | 1-78700 :82 | 12:68163 -45151 «078854 | 2:21479 
-55699 -18138 | 1-79535 -83 | 12:92528 -44973 -077368 | 2۰22353 
-55441 -17866 | 1-80371 :84 | 13:17474 -44797 -075903 | 2.23229 
0:55185 | 0.17596 1.81208 | 1:85 13:43017 | 0-446022 0-074459 | 2-24105 
-54931 -17329 | 1-82046 -86 | 13:69171 -44448 -073037 | 2-24982 
-54679 -17064 | 1-82884 :87 | 13:95955 -44275 «071636 | 2-25859 
.54430 | -16803 | 1-83724 .88 | 14:23386 | -44104 070255 | 2-260737 
:54182 -16544 | 1:84564 :89 | 14:51481 -43934 -068895 | 2:27615 
0-53936 | 0-16288 1۰85406 | 1:90 14:80258 | 0-43765 0-067556 | 2-2845 
-53692 ‚16035 | 1-86248 91 | 15:09737 -43597 -066237 | 2-29375 
۰53450 :15784 | 1:87091 -92 | 15:39937 -43430 -064938 | 2-30255 
.53210 | -15537 | 1:87935 .93 | 15:70877 | -43265 -063659 | 6 
.52972 | -15292 | 1-88780 .94 | 16-02580 | -43100 -062399 | 2-32018 
0.52735 | 0-15050 | 1-89626 | 1-95 | 16:35065 0-42937 | 0-061160 | 2-32900 
-52501 ‚14810 | 1:90473 -96 | 16:68354 -42775 | -059939 | 2-33784 
-52268 | ‘14573 | 1:91320 .97 | 17:02472 | -42614 058738 | 2:34667 
.52038 | -14339 | 1.92169 | -98 | 17-37440 .49454 | -057556 | 2:35551 
.51809 | -14108 | 1-93018 .99 | 17-73282 | -42295 056393 | 2-36436 
0-51582 | 0.13879 | 1:93868 18-10025 | 0-42137 | 0-055248 2.31322 


220 Normal probability function 


The normal probability function. Table of areajordinate ratios and their reciprocals (cont.) 


x PjZ 0/2 ZJP Z/Q 


2-00 | 18-10025 | 0-42137 | 0-055248 | 2-37322 
:01 | 18-47692) -41980 | -054122 | 2.38208 
:02 | 18-86311 | -41825 | -053014 | 2.39094 
"03 | 19-25908 | -41670 | -051924 | 2-39981 
:04 | 19-66512 | -41516 | -050851 | 2-40869 


2-05 | 20-08152 | 0-41364 | 0-049797 | 2-41758 
:06 | 20-50857 | -41212 | -048760 | 2.42646 
:07 | 20-94657 | -41062 | -047741 | 2-43536 
:08 | 21-39586 | -40912 | -046738 | 2.44496 
:09 | 21-85675 | -40764 | -045752 | 2.45317 


2:10 | 22-32959 | 0-40616 | 0-044784 | 2.46208 
11 | 22-81471 | -40470 | -043831 | 2-47100 
12 | 23-31249 | .40324 | -042896 | 247992 
3 | 23-82329 | -40179 | -041976 | 2.48885 
+14 | 24:34749) -40036 | -041072 | 249778 


2-15 | 24-88550 | 0-39893 | 0-040184 | 2-50672 
16 | 25:43771 | -39751 | -039312 | 2-51566 
17 | 26-00455 | -39610 | -038455 | 2-52461 
8 | 26-58645 | -39470 | -037613 | 2-53357 


X P/Z | QZ 2[Р 219 


2-50 | 56-69633 | 0-35427 | 0-017638 | 2.89974 
51 58:14464 | -35313 | -017198 | 2-83186 
:52 | 59-63565 | -35199 | -016768 | 2-84097 
'53 | 6117075 | -35087 | -016348 | 2-85010 
:54 | 62-75138 | -34975 | -015936 | 2-85922 


2-55 64-37902 | 0-34863 | 0-015533 | 2-86835 
"56 66-05523 | -34753 | -015139 | 2-87748 
57 67-78159 | ‘34643 | -014753 | 2-88662 
:58 69-55976 | -34533 | -014376 | 2.89576 
59 71-39146 | -34425 | -014007 | 2-90491 


2-60 73:27844 | 0-34316 | 0-013647 | 2.91406 
`61 75.22256 | -34209 | -013294 | 2-92321 
۰62 77-22570 | -34102 | -012949 | 2.93237 
63 79-28985 | -33996 | -012612 | 2-94153 
:64 | 8141704 | -33890 | -012282 | 2-95070 


2-65 | 83-60939 | 0-33785 | 0-011960 | 2.95987 
66 | 85-86908 | -33681 | -011646 | 2-096904 
67 | 88-19839 | -33577 | -011338 | 2.97822 
68 | 90-59968 | -33474 | -011038 | 2-98740 


арнар 
капыл былыктын 


`31 | 35:74783 | -37729 | .027974 2-65046 
:32 | 36-59516 | -37601 | -027326 2۰65948 
۰33 | 37:46608 | .37474 | -026691 2۰66851 
'34 | 38:36133 | -37348 | .026068 2-607755 


2-35 | 39-28165 | 0:37222 | 0-025457 2۰68659 
‘36 | 40-22783 | -37097 | .094858 2-69563 
:37 | 41:20068 | -36973 | -024271 2۰70468 
`38 | 42-20103 | -36850 | -023696 2.71374 
‘39 | 43:22974 | -36727 | -023132 2.72280 


2-40 | 44-28771 | 0-36605 | 0-022580 2.73186 
"41 | 45:37586 | -36484 | -022038 2۰74093 
42 | 46-49515 | .36364 | -021508 2۰75000 
`43 | 47-64656 | -36244 | -020988 2۰75908 
44 | 48-83112 | -36125 | -020479 2۰76816 


2-45 | 50-04988 | 0-36007 | 0-019980 2-77725 
"46 | 51-30393 | -35889| -019492 2-78634 
47 | 52-59442 | .35773 | -019013 2.79543 
48 | 53-92250 | -35657 | -018545 2۰80453 
"49 | 55-28938 | .35541 | -018087 2۰81364 


2-50 | 56:69633 | 0-35427 | 0-017638 


:81 | 129-60721 | -32184 | -0277156 | 3-10710 
:82 | 133-31763 | -32089 | -0275009 | 3-11633 
"83 | 137-14770 | -31994 | -0272914 | 3-12556 
:84 | 141۰10162 | -31900 | -0270871 | 3-13480 


2-85 | 145-18375 | 0-31806 | 0-0268878 | 3:14405 
"86 | 14939863 | -31713 | -0266935 | 315329 
:87 | 153-75095 | -31620 | -0265040 | 316254 
:88 | 158-24559 | -31528 | -0263193 | 3:17180 
:89 | 162-88761 | -31436 | -0261392 | 3-18106 


2-90 | 167-68228 | 0:31345 | 0:059637 | 3:19032 
:91 | 172-63504 | -31254 | -0257926 | 3:19958 
:92 | 177-75156 | -31164 | -0256258 | 3:20885 
:93 | 183-03773 | -31074 | -0254634 | 3-21812 
:94 | 188-49965 | -30985 | -0253050 | 3-22739 


2-95 | 194-14366 | 0-30896 | 0۰051508 | 3:23667 
:96 | 199-97636 | -30808 | -0250006 | 3:24595 
:97 | 206-00458 | .30720 | -0248543 | 3:25523 
:98 | 212-23545 | -30632 | -0247118 | 3-26452 


:99 | 218-67633 | -30545 | -0245730 | 3-27381 
3-00 | 225-33490 | 0-30459 | 0-0244378 | 3-28310 4 


Z(X) = eà] /(27), P(X) = 1—Q(X)= Í A 


:19 | 27:18387 | -39331 :036787 | 2.54253 | -69 | 93-07536 | -33371 | -010744 | 2-909658 f 
2-20 | 27-79726 | 0-39193 | 0-035975 | 2.55150 | 2-70 95-62799 | 0-33269 | 0-010457 | 3-00577 
21 | 28:42711 | -39055 | -035178 | 2-56047 | -71 98:26016 | -33168 | -010177 | 3:01497 
:22 | 29:07391 | -38919 | -034395 | 2-56944 | -72 100-97461 | -33067 -0299035 | 3-02416 
28 | 29-73816 | .38783 | .033627 | 2.57849 "73 | 103-77414 | -32967 | -0296363 | 3.03336 
:24 | 30-42041 | -38649 | -032873 | 2.58741 "74 | 106-66167 | -32867 | -0293754 | 3:04257 
2:25 | 31-12118 | 0:38515 | 0-032132 2.59640 | 2-75 | 109-64022 | 0-32768 | 0-0291207 | 3-05177 
26 | 31-84105 | -38382 | -031406 2-60540 | :76 | 112-71295 | -32669 | -0288721 | 3-06098 
:27 | 32-58060 | -38250 | -030693 2۰61440 | °77 | 115:88308 | -32571 :0?86294 | 0 
:28 | 33:34041 | -38118 | -029994 2.62341 | -78 | 119-15401 | -32474 | -0283925 | 3-07942 
:29 | 34-12113 | -37988 | -029307 2.063242 | .79 | 122-52923 | .32377 | -0°81613 | 3-08864 

2-30 | 34-92338 | 0-37858 | 0-028634 2-64144 | 2-80 | 126-01238 | 0-32280 | 0-0279357 | 3-09787 | 

| 


2-82274 


Z(u) du. | 
o 


х Q/Z | QIZ 

3-00 0-30459 0-0244378 3.28310 | 350 026057 
01 -30373 -0243063 3-29240 51 | :26590 
-02 -30287 «0241782 3-30169 52 | -26523 
-03 -30202 -0240535 3-31100 53 +26457 
04 -30118 -039322 3-32030 54 -20391 
3-05 0-30034 0-0238141 3-32961 3:55 | 0-206326 
06 29950 | -0°36992 3-33892 -56 -26260 
07 -29866 | 0235874 3.34824 :57 :26195 ˆ 
08 -29784 -0234787 3-35755 58 -26131 
09 -29701 -0233729 3-36687 59 -26066 
3-10 0-29619 00232700 3.37620 | 360 | 026002 
11 -29538 -0231699 3-38552 61 -25939 
12 -29456 | -0°30726 3-39485 62 :95875 
13 :29376 -0229780 3-40418 63 -25812 
14 -29295 | -0228860 3-41352 64 -25749 
3-15 0-29215 0.027965 3.42286 | 365 | 0-25686 
16 :99136 | 6 3-4320 66 -25624 
17 -29057 -0226251 3-44154 67 | 25562 
18 -28978 -0225430 3-45089 68 -25500 
19 -28900 -0224632 3-406024 69 -25439 
3-20 0-28822 0۰023857 3.46959 | 370 | 0-25378 
21 -28744 -0223104 3-47895 71 :95317 
22 -28667 -0222373 3-48830 72 -25256 
23 -28590 -0221662 3-49767 -73 -25196 
24 -28514 -0220972 3-50703 74 | -25136 
3-25 0:28438 | 0-0220302 3:51640 | 375 | 0-25076 
-26 -28363 -0219652 3-52576 -76 -25017 
27 -28287 -0219020 3-53514 47 | -24957 
-28 -28213 -0218407 3-54451 78 -24898 
-29 -28138 -0217812 3-55389 -79 -24840 
3-30 0-28064 | 00317234 3.56327 | 3:80 | 024781 
31 -27990 -0?16674 3-572065 81 -24723 
32 :27917 -0216130 3:58203 -82 -24665 
33 -27844 -0215602 3-59142 83 -24607 
34 -27772 -0215090 3-60081 84 -24550 
3:35 0۰27699 0-0214593 3-601020 | 3:85 0-24493 
:36 -27627 0314112 3-61960 86 24436 
37 :27556 -0213644 3-62900 87 -24379 
38 :27485 -0213191 3-63840 88 :24323 
39 -27414 -0212752 3-64780 -89 -24267 
3-40 0:27343 0-0212326 3.65720 | 3:90 0-24211 
41 27273 0311914 3.66661 :91 :24155 
42 -27203 03115183 3-67602 :92 -24100 
-43 -27134 -0211126 3-68544 93 -24045 
44 -27065 «0210750 3-69485 :94 :23990 
3:45 0.26996 | 0-010386 3.70427 | 3:95 0.23935 
46 :26927 -0210033 3.71369 :96 :23881 
47 -26859 0396911 3-72811 97 -23826 
48 -26791 -0293601 3.73254 :98 :23772 
49 :96724 .0290394 3:74196 -99 -23719 
3-50 0:96657 | 0-0°87289 3.75139 | 400 | .0-23665 


ZjP 

0-0°87289 
0984281 
0981370 
“078551 
-0°75822 


0-0°73180 
-0°70624 
"0568150 
0565756 
0563440 


0-0*61200 
۰059033 
0256936 
0554909 
0252949 


0-0°51053 
0349221 
0347449 
0345737 
0344082 


0-0742482 
0340987 
0389444 
:0338002 
03836608 


0-0°35263 
0238968 
0332708 
0531496 
-0°30326 


0۰03329197 
0328107 
0327054 


‚ 0526089 


0225059 


0-0824114 
0223202 
:0322322 
-0821474 
020656 


0-0519866 
0319106 
0318372 
0317665 
0316983 


0۰016326 
0315698 
0315088 
0314495 


-0313929 
00313383 


221 


3.75139 
3.76082 
3۰77026 
3۰77969 
3۰78913 


3۰79857 
3۰80802 
3۰81746 
3۰82691 
3.83636 


3-84581 
3.85527 
3.86472 
3.87418 
3۰88364 


3-89311 
3-90257 
3-91204 
3.92151 
3.93098 


3.94046 
3۰94993 
3.95941 
3.96889 
3.97838 


3.98786 
3-99735 
4۰00683 
4۰01632 
402582 


4۰03531 
4۰04481 
4۰05431 
4۰06381 
407381 


4-08281 
409232 
4۰10183 
411184 
412085 


413036 
413988 
4:14940 
4:15892 
4:16844 


4:17796 
4:18749 
419702 
4۰20654 
4۰21608 


4-22561 


222 : Normal probability function 


REFERENCES 


NATIONAL BUREAU OF STANDARDS ara irse: of Normal Probability Functions, Applied Mathe- 
matics Series, no. 23. Washington: U. 5. Department of Commerce. 

PEARSON, E. S. & HARTLEY, Н. О. (1954). Biomeiriba Tables for Statisticians, 1. Cambridge Univer- 
sity Press for the Biometrika Trustees. 

Pearson, К. (1931). Tables for Statisticians and Biometricians, 2. Cambridge University Press for the 
Biometrika Trustees. 

ТАТЕ, R. Е. (1955). The theory of correlation between two continuous variables when one is dichoto- 
mized. Biometrika, 42, 205. — 


` 


» 


[ 223 ] 


TABLES OF SYMMETRIC FUNCTIONS. PART V 
By Е. N. DAVID axp М. б. KENDALL 


1. We write M for monomial symmetric functions, S for the one-part functions, U for 
the unitary functions and H for the homogeneous produet sums. Previously we have given 
tables MS-SM (1949), UM-MU, HU-UH (1951) and M H-H M (1953). We now complete 
the set, up to and including weight (w) 12, by the US-SU tables. The HS-SH are the same 
as the present tables as far as coefficients are concerned. 

2. The US-SU tables have been partially caleulated when in the course of constructing 
the previous sets. There are many ways in which they can be built up. We have used here 
both the D operator technique and the elementary relations 


n! 
ssf... aps 


t = S(— [UH FE 
TE 171272... kT 719! ... Mg! 


(тү+л,+...+ль—1)!% qa, 


ag... ag 
Ty! Ta! «++ My! С 


S x pyeme enm 


the sum in each case being taken over all possible partitions. 

3. To express the S-functions in terms of the U-functions we read horizontally up to 
and including the diagonal figure in bold type. Thus for w = 5 we have, for example, 

(3)(2) = ai- Saga} + байа, + 32901 — база 
To express the U-functions in terms of the S-funetions we read vertically downwards 
up to and including the diagonal in bold type and divide the coefficients by w !. Thus 
a55.5! = 10 (1)5— 40 (2) (19 + 30(2* (1) + 20 (3) (1)?— 20 (3) (2). 
4, We have called attention previously to the use of symmetric functions in distribution 
urpose. They will also add consider- 


problems. These present tables will be useful for that p 1 side 
ably to the flexibility of the symmetric funotion system, for with this concluding set it will 


now be possible to express any symmetric function in terms of any other. 


REFERENCES 
Davin, Е. М. & KENDALL, M. G. (1949). Biometrika, 36, 431. 


Davin, Е. №. & KENDALL, M. G. (1951). rom om , 435. 
Davip, Е. М. & KENDALL, M. С. (1953). B: ka, 40, 427. 


ч 


224. Tables of symmetric functions. V 


Table 5.2 Table 5.3 Table 5.4 


w=2 w=4 
ау ay a 
@) (а) (0) (2) (1)* 
E 3 
G у ; 
(з) Q) "m | 


(4) 
Table 5.5 Table 5.6 
44; aa, аа? аа аа as w=6 | a aa ata! аё оа? aaa, a аа? аа аа а 
(° 120 збо 180 до 120 бо 20 go 15 6 I 
т 
(2) (x) 2360 —360 —270 ~360 —240 —120 —180 —105 — бо —15 
1 -2 
(2)* (1)* 180 270 . 180 180 90 135 90 45 
1-4 4 
290 * 4 Б . =45 . 15 
er 1 =6 12 -8 
(з) (* 240 120 80 240 120 120 40 
т —3 г . 3 
2120 -240 . —120 —120 —120 
GGO| | E 6 3 oer 
or 80 7 К . 40 
1 -6 9 . 6 —18 9 
(4) (9 -180 —90 —180 -9o 
1 —4 2 * 4 (е ^ -4 es 5 
eo 1 —6 10  —4 4 -8 . =4 8 
144 144 
(5) (1) rt s " 5 8 m 1 5 E 
(6) т -6 9 -2 6 -12 3 -6 6 6 -6 
d 
Table 5.7 
w=7 а ааб аға? аға аа* ааа Gu) ata, аа? шаа aa, аа? аа аа; а 
ay 5040 2520 1260 630 840 420 210 140 210 105 35 42 21 3 1 
т 
(а) (1)* 22520 -2520 —1890 —2520 —1680 —10$0 —840 — 1260 —735 -—315 -—420 -231 —105 -21 
1 -2 
(а)* a} 1260 1890 + 1260 1470 1260 630 945 735 630 525 315 105 
I =4 4 
а , . —630 б * “315 -315 * “31S -—105 -—10$ 
AUR м „== 
(з) Q)* 1680 840 420 56o 1680 840 350 840 420 280 7° 
1 =з . 3 
" - —840 —168o —840 ~1260 —840 -840 -840 ~420 
pensar уз у. ож. шш 
(3) (a)* 4 — 120 . 210 ar) . aro 
Li E 16 ~2 $3 =12 12 
(а)* (2) o + 56o . . 280 280 
r -6 9 6  —18 . 9 
Way =1260 -63o ~210 -1260 —630 —630 -210 
І -4 2 4 * -4 
630 . 630 630 630 
(4) (2) (1) "M4 Ma 2s » wd eh — 
(4) (3) ы Sm 
м r -7 14 -6 7 -24 6 12 -4 و‎ 
(5) (1)* _1008 боф 1008 504 
r T$ 5 5 ш] Ж . 5 
) (2) 2504 = 504 
(5) (2) 1 -7 I$ =10 5 -i5 10 -5 то ^ s -10 at 
6 - - 
GOTT IE S e = 
(7) 1 “7 " 


Е. N. Davip and М. G. KENDALL 225 


Table 5.8 


(а) (1)* 
(2) ()* 
(а)? (1)* 
(а)* 


1 (з) (° | 


(3) (2) (° 
(3) (2)* (1) 
[Odd 
(3)* (2) 
(4) (° 


(4) (2) (1)* 
M 


| 10080 15120 15120 e 10080 11760 10080 8400 $040 
| — 


| -34 6720 3360 4480 2240 13440 
Li -3 3 
| cx —6720 -13440  -—8960 
H -5 6 3 - 
юе : س‎ 
r - 16 ~13 3 -12 12 
1 ET 
£ -6 9 6 -18 . 9 
1 -8 и -18 -39 36 9 ei 4. 
3 2 à چ‎ є -4 
х J 10 -4 -8 -4 
r -8 22 = 8 
ї - 14 = 
H = 20 -16 4 


ооз Oc aC ++ 


CU 


(а) ба) 0) | 5 


(4) (2)* 16 ==) t N bs eats —3360 —3360 
(4) (3) 00) = ME І . 4 і 
o 34 Si =з 28 8064 4o32 1344 806 4032 4030 1344 
(o E - —4032 . EE —4032 —4032 
(5) (2) (1) i9 А sope Doa | k . — a688 
(5) (3) 15 -15 » g^ =35 15 -érzo -330 -6m0 -3360 
он HS ul oo бона ЕЧ" 
(6) (2) m o Ss ^ 6-12 . T6 та EE 
с) а) Р HAUT : 70 
(8) 8 


Biom. 42 


1079 (i) 


ay 
(а) GY 
(а)? (1)* 
(2) (1)? | 
(а)* (1) 
(3) (x) 
(з) (2) (1)* 
(з) (2)* (1)* 
(з) @* 
(з)? (1)? 
(з)? (а) (1) 
(3) 
(4) (° 
(4) (2) (1)* 
(4) (2)* (1) 


Фа 


o | i 


Qi 


(4) (3) ()* 
(4) (3) @) 
(4)* (0) 
(5) (1)* 

ts) (2) (1)* 
(5) (а)* 

(5) (3) (1) 
(5) (4) 

(6) (° 
(6) (2) (1) 
(6) (3) 

@ ay 
(7) (а) 
(8) (1) 
(9) 


a 


Ga, a, 


аага? 


аа? 


aad а аа ада 


- 35280 — 15120 


__90720 136080 136080 90720 105840 90720 90720 75600 45360 45360 68040 
1 -4 4 
Ex —90720 —45360 —75600 + 745360 —45360 . 722680 
1 —6 12 - 
22680 22680 
ї -8 24 -32 16 
.120960 60480 30240 15120 40320 20160 10080 120960 60480 
1 -3 3 
-60480 -60480 —45360 —120960 —80640 —60480 — 60480 
1 -5 6 3 -6 
30240 45360 60480 — 90720 
1 -7 16 -12 3 -12 12 
—15120 
1 - 30 -4 24 -18 6 -24 
2 З 3 40320 20160 20160 
1 -6 9 6 -18 9 
-20160 —60480 
1 -8 21 -18 6 -30 36 9 -18 
13440 
1 -9 27 -27 9 —54 8r 27 -81 27 
290720 —45360 
1 ~4 2 + А - 
45360 
1 -6 10 -4 4 -8 * -4 8 
1 -8 22 - 8 4 —16 16 t 5 -4 16 
1 -5 1 = 2 7 - 6 а 12 € ^ -4 12 
1 - 2 - 12 -3 -12 12 =24 : - 20 
1 28 20 -1 4 H -32 1 e 16 M P 21 32 
1 -5 5 М ч 5 =s 1 $ -5 i 
1 -7 15 —10 5 —1$ 10 . . =g 10 
1 x 29 —40 20 Ч -25 40 —20 Е Е : -5 20 
1 - 20 —15 . —35 3o H 18 -15 V -3 15 
1 -9 27 —30 10 9 —45 50 —10 20 —20 Н -9 40 
1 —6 9° -2 6 -12 5, з а è -6 6 
1 -8 21 —20 é 6 - 24 ё 3 -6 © 2 —6 18 
1 =9 27 -2 9 = 63 -6 21 -45 ^" 9 —6 24 
1 =} 14 -7 К 7 —21 7 Е 7 3 А 27 14 
1 = ET 14 РА —35 49 —14 а ER ^ - 28 
1 - 20 - 2 -32 24 * 12 - D ч 
1 -9 27 =» 9 9 —45 54 =9) 18 — —27 3 zi) 3 
ала аа“ ааа аай аат а а.а ааа аа аа? аа аа а 
2520 12бо 630 3024 1 pa 756 504 126 504 252 84 72 36 1 
—226080 —1 —75! E —16/ —9072 ees —2016 —7560 —4032 —1512 -1512  —792  -—252 —36 
ne cll ved 453 37820 27216 22 9828 22680 15120 7560 7560 4530 1890 378 
- — 22680 —22680 —30240 —22680 —15120 —7560 —1$120 —12 —7560 —75 -3780 —1 
5670 Е 11340 Ы 5670 = 3780 3780 3780 945 95 
25200 i ообо 60480 o2. 15120 11088 3528 20160 10080 3528 5040 20 тоо! 1 
—90720 —57960 — о —60480 E —45360 —50400 —27720 —60480 —40320 —22680 —30240 07960 —10080  —2520 
15120 52920 30240 + 30240 45360 45360 3700 + 30240 37800 15120 680 15120 7560 
« —7$бо А . —15120 —7560 б . -2$20 . —7560 + 72820 
40320 20160 40320 20160 20160 20160 10080 10080 20160 10080 10080 3360 
. —20160 + —20160 —20160 „ —10080 Paper . —10080 — 10080 c 
Е 2 . К x . А = B Е 720 3 а М 2: 
—15120 -—7560 —7$бо —90720 —45360 —22680 —15120 —4536 —45360 —22680 -—7560 —15120 gue —3780 —756 
45360 30 45360 . 48360 45360 45360 30240 45360 45360 30 45360 39240 22680 75 
. — 22680 — 22680 . — 22080 . —2268о . —22680 —22680 . — 226 —11340 —11349 |. 
130240 —15120 — 6i . —30240 —45360 —15120 —30240 —1$120 — 30260 —1$120 
-12 
15120 15120 . 15120 15120 . 15120 
-12 24 
22680 22680 11340 11340 
-32 16 
36288 18144 12096 3024 72576 36288 12096 36288 18144 12096 3024 
5 P 
236288 —36288 —36288 —18144 —36288 —36288 —36288 —36288 —36288 —18144 
5 -10 d f 
18144 9072 18144 9072 
5 -20 20 
24192 24192 24192 24192 24192 
-15 . 5 -15 18 
718144 t —18144 
-40 ¢ 20 5 -20 10 20 -20 
150480 — 30240 10080 —60480 — 30240 —30240 — 10080 
. 6 В -6 
30240 30240 + 30240 30240 30240 
. 6 -12 . . -6 1a 
720160 . —20160 
-18 18 . 6 -18 18 . -6 18 EI 
51840 25920 51840 25020 
-7 . . -7 . 7 . 7 
- 25920 ‚ —25920 
-7 4 . 7 -21 14 . . me 14 7 714 
8 
9 


ю=то@) | a; 


(а) (1) | 
(а)* (1)* 
(2) ay 
(2)* (1)* 
(2) 
(з) (x)" 
(3) (2) (1)* 
(3) (а)? (1)* 
(3) (а)* (1) 


( y 4 
G PY 
as » 
eus 

$ 

d O 


мнмын мым кыны Da р ра на 


ннн 


re Чеш 


gy | Ж ЖЕ 


Е. N. Davip anv М. б. KENDALL 


ТаЫе 5.10 


4 
zastoo 907200 —1134000 . 
12 - 
24 -32 16 
E " 
40 -80 80 -32 
1209600 
. . . 3 
6 . . 3 
16 -12 3 
30 -44 24 3 
21 ET 6 
37 —60 36 . 6 
27 -27 9 
2 " А 4 
10 -4 : . 4 
яу $ ё 4 
z= 5 -1 
i 26 ; E 2 
= 5 ho 
— 1 . 
Hs —16 1 8 
36 —56 3 -8 8 
5 5 
15 —10 . 5 
29 -4 зо У i 
20 —15 А 
36 —55 . 25 
27 —3° 10 9 
35 —5o 25 . ло 
-a А 6 
21 —20 4 '6 
37 -62 ` 4 -8 6 
27 -29 . 29 
35 —50 26 -4 10 
L -7 г . e2 
ще ео a 
35 -49. ү 8 
36 —56 34 “4 З 
27 —30 BK 
35 —$0° 2$ -2 то 


% 


302400 151200 
—1209600 ~ 756000 
go7200 1058400 
7453600 
604800 302400 
= Sostoo — 604800 — 453600 
юм» 
-12 
-18 6 
E 5 
=30. 3 
з [d 
EPA 
-16 P: 
E 6 
— 54 
ы 120 
- 1 
x3 s 
aş К 
=15 10 
-25 40 
-35 3o 
E . 100 
—45 
- тоо 
+12 FAI 
= Я 
-21 ү} 
735 | 49 
А 112 
248 Ч 
-45 54 
- тоо 


15-2 


~~ 


get 
Фф [OUO] 
GY Gy 

(G) (2) (1)* 
(э)" (a) 

GY (1) 
(OLON 

(4) (а) (1)* 
(4) (2)" (° 
(OLOH 

(4) (3) а)? 
(4) (3) (2) (1) 


Tables of symmetric functions. V 


Table 5.10 (cont.) 


SSERARE LRSSo 055559. 


(з) (1)" 
(3) (2) (1)* 
à (а)? Gy 
3 


GG | 


o" » i 
By (0) | 


we i 
«Or D | 


w b) s Q) | 
(4) (3) | 
(4 0) 
(4° (2) 

(5) (1)* 
(5) (2) GY 
(5) (2) (1) 
(5) (3) 0) 
(s) (3) (2) 
(5) (4) (1) 
(5)* 


(6) (1)* 
(6) (2 mn 


Е. N. Davin Ахр М. G. KENDALL 


Table 5. 10 (cont.) 


= 
ouuu cocoa P 


Г 
ос I 


H H^ 


ii 


fi 


201600 
= 201600 


i 


Uu 
r 
8 


зл BE 


Ra. 


СЕРР 


230 Tables of symmetric functions. V 


Table 5.10 (cont.) 


w= 10 (iv) 


151200 
453600 


— 302409 


90720 
241920 
-18 1440 


(6) (x)* ¬ 100800 
oww 302400 . 302400 


(6) (2)* 


(6) (3) (1) 


مھ | 


Qo 
(00 (а) а) 
(7) (з) 

(8) (x)* 

(8) (2) 

(9) (1) 
(то) 


Е. N. Davo and M. G. KENDALL 231 


Table 5.11 


чай 
wert (i) a," aa," а?а! а?а" a'a’ ata, aa," ааа аара ага! аа 


(хуч 
(а) (1) 


39916800 19958400 9979200 4989600 2404800 1247400 © 66s2B00 3326400 166узоо 831600 415800 
Ы 219958400 -19958400 —14968800 — 9970200 -6237000 -19958400 —13305боо -8316009 -4989600 —ag10600 
2 


eray 9979200 14968800 14968800 12474000 . 0970200 11642400 9979200 7484400 
1 -4 + 
А zi —9979200 —12474000 7 + 4989600 -8316000 — 9147600 
а G | ^ ч 3 Ilem ^ 
Я 37000 ; м . 2404800 5405400 
E 1 -8 24 -32 HM 1247400 
(а)? (1) ; 255 s =e 80 ue 


(3) (1) 


: Е; у : ` X - -665:800 -4989600 -3326400 
O , = : а A i - 


| 4989600 4089600 
(3) D Г -3 16 Y 3 =12 aes эз 
(з) (2)? (1)? | " Sg 30 -44 24 3 -18 36 -24 
‚| щн 
(3) (2)* | I -11 48 -—104 112 -48 i E т —96 
Ko t M mr. : ; Е aa : : 
» à o | i -10 37 AE 36 е =e 96 -72 
GFE = 2 - . = ` . 
А | i E 5 —81 s 9 S 189 . - vx , 
) (2) | ї - 10 =% n 4 -8 y 2, 5 
а (ууа -8 = 8 4 =16 16 * б 
б í -10 38 -6 56 -16 4 -24 48 P 4 
j ) (x I -7 1 -6 " " 7 = 6 A j f 
lode qus 1 Fre 3 =x D ej KS та 24 
28 (0 I -10 35 -48 18 à то ' - 120 -3 64 б 
| ste т E E. | PLE 2" 2H EMEN Е 
9 2 1 Zir E - 32 -2 п -® 172 -96 12 
s А > Š 5 - M -H 
3 í 2» 15 -10 X Я 5. = то ER Ww. 
$$ D) I -9 29 -40 20 P s 1—25 40 
à 1 -11 47 100 -40 { а; 9o —100 40 
E 16 » І -8 20 Єз . H БЕН " Ea E D Н 
ch o H Zir e 25 as. Я п -83 195 -135 | 
ф Ys 1 -9 27 E = En : es 150 “по 20 
) da : = a —50 25 " 10 -60 100 -50 
af Qt ожо ооо ОУК ВИ Е 
: А 2)* (1) | у —10 37 —62 44 -8 6 = 72 pe . 
= 63 ы X 
ҮҮ 1 AJ 27 = 6 5 9 x Ee H 
Bi : ate 1 EN $ 24 18 —6o Е -32 ; 
Oe : at HH A 55 =10 тї -7 165 -ris 10 
1 -7 1 p " Е d me ۳ -14 
(7) (2 aer 1 -9 Ei z3 H _а Ü z3 119 -112 28 
( 2)! Ж ~11 46 S P н yA 26 nm Ese 
7 RA E EEE $^ -u S. cms Gm c9 4 
à 1 -8 20 =1 2 М 8 32 ж P 
шо cz imu йан c СЕГЕ Fer i 
99 1 29 ы ә > 1 leer - М im d 
g $: E 69 -1% 9 -63 14 117 
eat І E M M as -2 то -60 dx ain H 
m Н -и 4 -7 55 -и п ie! od Ne 


232 Tables of symmetric functions. V 


Table 5.11 (cont.) 


w= r1 (ii) | ёа ааа) агага; аға? а?а, ааг’ ааа‘ 2,0, a, 4,0; à ааа 
су, 1108800 277200 184800 92400 1663200 831600 207900 

e т)? aeree - 5 bue Noi = —s979200 — 3821200 E 1871100: —24 4800 um 

e тр "4989600 6652400 4080600 — 3 * -Дойю - -Satzco -щўко - diem 

VO : : oon + 2494800 А + — 1247400 328950 : M 

4435200 2217600 1108800 1108800 13305600 6652800 5 166 i 
quim imb xm ШЫ quc ое шш шы ШО ыш әш 

2)* (x)* х - —3326400 + . 4089600 5 : с Боа 3 ДШИ 

(з) (2)* . : ? х а a ; : 

(3)* (° {з= 2217600 1108800 2217600 1108800 є Р " ^ 4435200 2217600 
(з* (а) (* : = 221 - —2217600 — 6652800  —4435200 . > . З + — 2217600 
(3)* (2)? (1) : m 1108800 . 3326400 

Gra s MTS Wess 

7 —81 v 27 
(3)* (2) e =e YE B свән 
(Gy 19979200 — 4080боо -2494800 — 1247400  —-1663200  -831 
A 5 f Я -4 
(4) (2) (0* ae 4989600 3742200 4989600 3: 
г s є " " -4 
(4) (2)* (1)* И = — 3742200 . 48و24‎ 
E . . . . -4 1 -1 
(4) (2)? (1) 2 1247400 ; 
5 А A d А -4 24 = 32 
(4 (3) (° 12 s а = : -4 12 = 12 E 
| uou | | 
i 12 - 48 1 : 24 28 =з 8 xia 
A "reme M MEC M EE MET 
2) 1 -32 5 * A -8 = 2 m " 
(б | 49 - 144 45 48 * = ; 4 - uu 38 - % * 
oS x ES 35 —20 Ў К 
oit m quas eese 
15 15 . x 25 
or эшо) 1$ =45 зә 3 1 E 25 -* à 21 
^ 6 | ү 39 - 26 135 45 -45 -$ E: 5% E 238 
o ү ао 66 © ; 2 28 г ОВАН о. -5 EM 
ci -50 25 . . 535 so -50 5, = 50 
ai A 3 -6 ч ; н =6 18 -12 с 
-12 12 " * -6 зо -48 24 
) (1)? ar —45 x 9 ; -6 —18 -18 
NEC TEE puo: j 
^ 33 105 75 15 - 15 ks ё ES 98 ® 1 
ty 7 24 s -2 E: -28 8 2) 
(7) 3) g 4 - 21 21 zy S 28 s 74 
К 35 -112 42 ET: 70 = 56 -6 E 
12 -8 -8 24 -8 E -1 
© à) ed 12 БЕ -8 
36 140 3% -4 -8 48 zb ря -- 
18 -27 -9 % -27 Е; 29 
Ё 


16 
d 2) 18 -63 s 2 -6 
(10 & t) 5 H $ 9 


Е. N. Davin AND М. С. KENDALL 233 


Table 5.11 (cont.) 


wrt (iii) ааай 


Gy 69300 


d 
8a 


(3) (2)* 
!ü» 


Ф «о 


)(2)* (1) | 
WOR 
(4) (3) (а)? | 
(4) (3)° (1) 
(9) 
(4)* (2) (1) 
(4)* (3) 
(5) (° 
(5) (2) (1)* 
(5) (2)* (1)* 
(5) (2)* 
(5) (3) (x)* 
(5) (3) (2) (1) 


Ba 
allt 


E Be (1) 


- 
nn л 


oosa ee ne 


ORE 
o 
вй) 


Baeosra aee 


284 


w=11 (iv) 


| 
ge | 


Oey 


" 
, 
7 
s 
э 


1 
o» d 
3, 


Фф „|| 


) (а)* (1) 


E 


m d з 
К 
КО; 
(5) ()" 

(5) (4) (x)* 
(5) (4) (2) 
"ue (1) 
(6G) 

(6) (2) (x 
OGG) 
(6) (3) а) 
(6) (3) (2) 
(6) 0) 
(6) 


vii 


e D \ 


Table 5.11 (cont.) 


3991 680 


= 3991680 


Tables of symmetric functions. V 


1330! бо 
—3991 


2661120 


2079000 
2910600 
1247400 
—831600 
1663200 
—831600 


665280 
— 2661120 
1995840 


BES 
= 554400 
2217600 
- 1663200 
— 1108800 
1108800 


831600 
— 2772000 
277200 
739200 


—2217бо 
1663200 


rm 
31600 
—2217600 
3326400 

= 554400 
1247400 
712474090 
332649 
—1995840 
997920 
2661120 


— 1995840 


Е. N. РАугр and М. б. KENDALL 235 


T Table 5.11 (cont.) 


'920 


"Sardo 


— 831600 


8 - 3320400 


2)* (1)* 1663200 
Зб" 


2)* | . 
oa d 2217600 


- 1663200 
(4) (2) (1): | 4989600 
Oe, (1)? * 
4) (2)* (1) 


) - 3326400 — 1663200 
| P: Wi ss + 1663200 
2 3) (2)* | б . 
р" (C? 
e ea 
$ А 3991680 


е 


. —498960 
У cx o 5 о 1330560 
с F оо р > 88 : = 1339560 


- 6652800 


5702400 
7 
7 
7 
7 
7 
8 
8 
8 
9 
9 


8 2 N ( D 
МИ 
(7) (1° 
(7) (2) Gy 
(OLOH 

(7) (3) (0) 
(eG 

(8) (1)* 

(8) (2) (1) 
(8) (3) 

(9) (x)* 

(9) (2) 
(10) (1) 
(11) 


2851200 
— 2851200 
- 2851200 — 2851200 
-14 : Р 1425600 
—28 


=21 


Table 5.12 


а" азар а?а? aja atat aša? а аа ааай аала аадар | 
! 
(1) | 429001600 239500800 119750400 — 59875200 20037600 14968800 7484400 — 79833600 39916800 19958400 99 / 
x 
(2) (° ; MM — 239500800 —179625600 —119750400 —74844000 — 44006400 —239500800 — 159667200 — 99792000 — 50875200 
(2 (1)* : mme 179625600 179625600 149688000 112266000 + 119750400 139708800 119750400) 
-4 
GP е ^ З 259875200 ~ 110750400 — 149688000 — 149688000 * + — 59875200 —997920 
= 1 = 
(2)* (a) a 74844000 112266000 $ > + 29937600] 
1 -8 24 —3a ET 
Go зараи — 44006450 
1 —10 40 — 80 80 -32 
" d А 
@) 1 -12 бо —160 240 —192 M 
0» 159667200 ^ 79833600 39916800 19958400 
1 -3 . А ^ ^ - 3 
(з) (2) (1)? ; = К F 2-0 — 79833600 — 50875200 
(3) (2)* (1)* 39916800 59875200 
1 -7 16 -12 P ê . 3 -12 12 
a) 19958400 
(3) ¢ des Ё 20 e 55 ú : 2 $ ун 36 - 
(3) (2) (1) 1 -11 48 —104 112 -48 i - 72 - 
)* (1 a -6 9 2 -1 ; 
+ 1 -8 21 =38 å 6 —30 36 К 
1 —10 7 - —42 = 
1 -12 = —134 106 =72 6 Fi E -264| 
i n. i5 ub | Яй 5 ЖО 
1 -12 54 — 108 it 12 —108 324 = 324) 
1 -4 > 4 ‘ ё 
(4) (2) (т)* 1 -6 - 5 -8 . 
Е 7X Us cee 2 a 
1 =10 3 - -1 5 4 —24 
(4) (2)* 1 -12 58 -r 192 —128 32 4 —32 96 — 
) 1 -7 4 - x A c 7 - 6 : 
1 -9 -34 12 7 E 54 714 
1 = 46 —90 8o -24 7 -52 130 = 120) 
1 —10 35 —48 18 10 - 120 
1 —12 55 -118 114 —36 10 —86 252 —276 
1 -8 20 =16 4 8 = 32 16 
1 —10 36 — 56 6 -8 8 E 8o -32| 
1 -12 56 23 us —8o 16 8 - 176 -192 
Y -11 44 - $2 -12 u —8о 172 = 
1 -12 54 —112 108 -48 8 12 —96 240 —192) 
1 =s 5 . 5 -5 x 
т = 15 —10 . 5 -15 10 
1 -9 29 —40 20 5 Sas 40 =й 
1 -n 47 —98 100 740 Н -35 9o 793 
1 -8 20 -15 А =95 30 
1 —10 36 m зо 8 -51 100 =60 
ї =12 56 —127 140 —6o 8 -67 202 = 260 
1 -11 m —75 45 п —83 195 - 
1 -9 27 =a 10 9 x45 5o <a 
1 -—11 45 - 7° —20 9 —63 1 m 
1 —12 —зїї тоо БЕЯ 12 - 266 = 250 
(1 1 —10 5 = 50 25 10 - 100 > 
a r =i 5 = 120 125 E 19 —8o 220 - 250 
(6) (2) b 1 -8 21 -æ 4 6 см 24 - 
6 (@ (1)* 1 -16 37 -@ 4 -$ 6 =36 72 -48 
1 -12 5 —136 168 - 6 6 - 1 — 193 
1 -9 5 7:30 6 % 3 9° = % E 
1 -и 45 -83 64 -12 9 —66 159 ZW 
1 -12 —110 9: -18 12 —102 =. 
1 -10 35 -50 2 - 10 —бо 96 38 
1 -12 55 —120 126 = 8 10 —8o 216 7224 
1 -1 m -77 55 - to Д ir -7 165 —11$ 
ny Says Í c s - d 108 —36 4 12 zi 252 -240 
(7) @ i» 1 ~9 d —35 14 а E 7 —35 49 = 
2) (a)* (1 1 -n 6 - -28 - 118 - | 
{ )t n 1 -10 * 2% "d И i -63 12 
2 3) (2) 1 -r2 55 = 119 m -42 10 -83 238 M 
7) (4) (1 1 = 44 „жи 2 -14 1 -27 161 = 98) 
1 -12 54 —112 105 -35 12 —96 252 = 245 
1 1 mE 36 E 2 8 -32 24 E 
8) (2)* 1 -12 56 158 we =a 8 8 zd 184 -2 
1 -=n 44 —76 50 -6 u —Во 180 — 120) 
1 -12 54 —112 106 -40 4 12 —96 248 7224 
1 -9 27 —30 А —45 54 E 
1 = 45 zh H -18 9 Eg 144 207 
1 -12 Tg 99 -27 12 - 270 Е: 
1 -10 35 =50 25 -2 10 - 100 т 
1 -12 55 —120 125 E 4 10 -80 EL 
1 -1 "m -77 55 БП n - 165 ->n 
1 -13 54 -112 105 —36 2 12 = 252 s, 


Table 5.12 (cont.) 


ma (ii) | 


d 


«) { А 


wr бу) (1) 
4), 


| „б . 
ү! 
ex 
ISINA 


©) 
ө 
BOR 
oO 


199 ) 


E dines - 
75200 ins E 


к Seep 


. —39916800 — 66528000 
. aes 19958400 


16800 — 
-59875200 E 


958805 uem set „ОЁ 


1 зоо о 39916800 
jx : caes T 
Eo 
(3) (2)* (1) | 
(з)* (° 
(3)* (2) (1)* 
(3)? (2)* (1)" 
(3)* (2)* 
[COH 
(э)? (2) (1) 
(3)* 
(4) 00)" 
(4) (2) (1)* 
(4) (2)* (x)* 


. 7$9875200 баң 
13305600 6652800 26611200 13305600 8870400 


= 26611200 —26611200 — 19958400 — 79833600 —53222400 — 53222400 
Lm 19958400 
-36 


NN 8870400 
| 7 __~ 8870400 ~ 35481600 


39916800 — 79833600 


E. s Dm 


ME 


PISA Shobbrobot FERES 5 


Table 5.12 (cont.) 


wara (iii) | aa" a," аа ааа’ 444,0,4, жада ааа? 440, à; a" a a, aaa," a aî 
^ 2494800 831600 831600 Soo 
aise TER a 13414000 LE - үнөн обоо isses -ф зоо — 9919206 _ обоо rr 
її ees WE 11 16632000 34927200 22453200 13920300 
i p -89854400 E 2 -29937600 = 739916800 - 33204000 Dies per 732432400 mno 
а, H а 37422000 53638200 32432400 14968800 27, p oo 18711000 25571709 
2) )2( —7484400 —22453200 o € =% . 3742200 — 11226 
mc м Pin 58400 22 оо 33264000 16632000 8316000 6652800 26400 _ 6652800 3928429 
"bus - = uy = = =. = 6 = 
Sat: MES EU NE Ge due NS MES mus 
® (2)? (1)? | 19958400 — 39916800 + 9979200 —39916800 —19958400 — 53222400 + —19958400 — 39916800 
3 Gy А * 9979200 : - 4989600 . 9979200 . + 9979200 
" . 53222400 26611200 13305600 19958400 9979200 53222400 26611200 13305600 
grt 3f . - + 726611200 —26611200 — оңадобо —43243200 + —26611200 — 20011200 
(3. Baa 1 * * $ 13305600 52800 e + 13305600 
OH . à 2 > < + 73326400 ^ 1 
(ya) ш) à : : : E mn r ; 
(4) (1)" |- 14968800 —7484400 -19958400 -9979200  —4989600 —3326400 —1663200 -9979200 —4989600  —2494800 
j^ 44906. 400 6800 24948: 642. 400 
é oe ў бо E жне 20957600 — 349 200 -2093 -24948000 eri -44008400 -39916800 
waa 2 2993 3 А 14 а 1. 800 E 29937600 
- . . . . . ‘ . —7484400 
war "PL 
(4) (з) (1)* = 39916800 —19958400 —9979200 — 13305600 —6652800 —79833600 —39916800 — 19958400 
E -12 
(4)()(2)(x)* А ‘ ds exi 19958400 39916800 26611200 + 39916800 39916800 
(4) () z^ І ЕЎ E. + — 19958400 : + — 19958400 
Mora à : -- . 72 - ET 001 
(4) (3)* (2) 74 М cen 120 Sun ЕТ 522 
29937600 14968800 7484400 
(9* ot А . -32 * E . : 1 = 14968800 ro 
- = 14968800 
(4) (2) (x) 32 . -32 64 . . б 6 -32 
(9t c» 192 —64 -32 128 —128 7 ч 16 —64 64 
[TOTO 
48 А -56 —48 - . 6 - : 
g | 192 -48 24% 384 = 192 28 . 48 =й 96 
o i n * > қ E E 5 a x : 
§ t К 
Кет 
er ^ P ў -15 зо 2 $ 
ron 6o ^ -15 6o —бо # А 
. . Ee a 5 -45 E 
( | бо Н - -40 " г - 
j CR 9o . -67 iu - Hd -120 6o A = 
Yt Á . . = 50 50 ә > 5 25 
Be \ 100 -50 150 = 100 " @ 2$ —50 
©) : > . . s 2 
[ys | 24 К { 2 
609 un ў se 5 -1$ 18 s 
eom E З 18 Э. 
eo 8 dE (M. naa р ae UNE ETT ; 
Үү i 164 ~40 - 168 ~144 -12 24 24 -72 48 
5 40 . - 120 —30 -1 М 30 =30 Н 
e 132 -24 -72 216 -14 - 36 36 -72 36 
1 ; k -7 . s 2 . > б ' 
(7) Bt . . =? 14 
7) (2 6 . - 28 -28 + . 
1 it T x À Ee 63 z -ai H » К 
. - 11 =r - ‚ А 
Ё Se) 8 ZUR u.c. ст з. Т 
105 . E? 245 = 140 -70 35 2 70 
8 б ‘ - à -$ 
oi |, 128 وود‎ 315 " =64 ‚ 1 56 16 
BR 144 -24 - 288 26 oats و“‎ a 21 40 
(9) (1) . -2 8 à > 
© [у РЯ E з فو‎ ; ШЕ: 
5 5 1 = 207 7135 -81 54 9 ~27 
(10) (1 10 40 6o 4 -10 М 15 -10 
m G 130 -20 -40 140 -120 -310 20 15 -40 20 
11) o m! 132 -38 —-33 ; 22 -33 
(12 120 -12 -72 240 -i4 = 36 30 -72 18 


| 


өзи | аё 
1)" 34650 
(a * —623700 
Gein es oso 

2} (1)* | — 

2)* B 12162150 
2) (1) | — 5613300 
a" 35550 

Qu 1), 3! 
79979200 
| Фе) 1) | 34927200 


o (ау — 29937600 
DG 7 Mo 
t: 6652! 

ї 


» = 39916800 

d а), 1) | 19958400 

3 (n | : 

8 ^| 17740800 
IOONE ч 
wG) | -623700 
7484400 

— 26195400 


@ (2) | 
et 


-22453200 
1122! 


2 о 


(4)* (2) io » 
(s 3 % 
(4° 


(5) GY 

(5) (2) (1)* 
(5) (а)? (1)* 
(5) (а)? (1) 
(5) (3) (x)* 
(5) (3) (2) (1)* 
(5) (3) (2)* 
(5) (3)* (1) 
(5) (4) 00)" 
(5) (4) (2) (1) 


b TEE 
ЫШ О: 
or Gr 
3 ш Am 
Wan o: 
БШ: 
Bevin 
T 


2 ME 
(8) i | 5 


Ә-ә 
) Y » " 
eB: 
"m 


Table 5.12 (cont.) 


аа! ааа’ аага) aea ааа ааа" 
= o r 997920 66: 

16800 epee 711975040 -Өф үзө em F. 
75200 35925120 23950080 29937600 19203120 
. = 29937600 — 3991 ES —29937600 – 299: 

. . 14968800 1 
79833600 16800 1 58400 E 36160 7318080 
— 79833600 E —59875200 — ا ا‎ - 
3991 59875200 5200 5200 1600 
— 19958400 — 39916800 729937600 
: 26611200 1 05600 
— 26611200 ¬ 20011200 
. 13305600 
- 119750400 — 59875200 —29937600 — 14968800 - 19958400 —9979200 
59875200 50875200 44006400 59875200 39916800 
. — 29937600 Eco ы чыг ди 
3 : = 39916800 ا‎ 
. 19958400 
_95800320 47900160 23050080 11975040 15966720 7983360 
: = 47900160 —47900160 —35925120 —47900160 —31933449 
5 -10 
„23950080 35925120 2303o000 
5 =20 20 
211975992 
- 6o - 
ү v 4 gmo 15966720 
5 -15 . Б 15 » 
5 -325 30 15 -30 
5 -35 80 -60 15 =6о 
5 -з0 45 . 30 aad 
5 —20 10 . 20 
—30 so —20 20 —40 
; 235 jo 650 35 —120 
10 —$0 50 . 50 ووس‎ 
10 -70 150 —100 so —150 
6 Е . . 2 
6 -12 m . 
6 -24 24 ` y 
6 5 72 ~48 3 3 
9 28 6 Ў 5 -36 
Sarai aos 4 А 36—198 
6 - 12 . 24 à 
6 = бо -24 E =48 
зї = 7 -10 60 -9° 
12 -72 1 -24 72 -M4 
7 -7 . B an Я 
7 -21 14 . $ ¥ 
e E MB ear 
7 -42 77 Иа E! E 
7 -35 42 imis =1 
12 E 140 -70 "0 75 
3 Zia 32 А H = 
8 -48 96 -64 A Eas 
8 -40 $ { > 28 
8 -48 -32 s 
8 
9 -27 9 à > = 
> 63. ~18 18 36 
$ ота S арр ЕНЕ сиз. 
то - 3o Д 5 280: 
10 = 6 zh зо -66 
m Н н uae eo, ch 


Б Ed 


ИЕ эн 


= 


LL aaa a 
110880 166320 83160 
-r -—661120 ~ due 

9з! 12972960 

1 E n 13721400 
. 73742200 
E E RE 
53222400 43243200 


= 39916800 —9979200 72997699 
931 o 26611200 19550 
73991 —26611200 — 20011200 
33264000 . 13305600 
8870400 i 1 
3316400 — 5087520 -2993760 
30916800 22952160 
-2003 -29937600 ир 
—1330$600 — зоо — : 
anus E 
> . 79979200 
= 13305600 . . 
‚ 29937600 14968800 
. — 14968800 
2661120 3991680 1995840 
15966720 —23950080 — 13970880 
23950080 11975040 17962560 
E . 78987520 
10644480 31933440 15966720 
—31933440. + -1$966720 
Em ; 
45 
z — 11975040 
. -20 
21975040 
ёз сю & 
й -50 . 
d —50 100 
ОЖАЙ ОГ; 
ч -24 48| 
1 -60 3° 
E -72 72 
x ES -a 
35 -70 105 
Xo us €* 
| 
КЕ Ер } 18 
547 lo { 27 
$ -20 45 
п -3)00 20 23 
3617 - 3 ) 7 


| 


G 98 i 


BOG) 
E JA mt 


M ad 
we D А 


4 


| J | 
na 


ot 


d 4 

" i | 
E 3) 2» 
owt DN 
(5) (4) (3) 
(5)* (x)* 
(5)* (2) 
(6) (1)* 
(6) (2) (1)* 
(6) (2)* (1)* 
(6) (a 
(6) (з) (x)* 
(6) (з) (2) (1) 
(6) (3)* 
(6) (4) ()* 
BBA 
(7) К. 
Qo 
E 


ааа, 


27720 
—$26680 
ES 
11226600 
73742200 


83 1600 


79313920 
30935520 


663280 
— 5987520 
E Во 
ч. 20 
2800 
* | 23950080 
3991680 
13044480 
—3991 
11975 


а?а} 


3264 

-685280 

_ 4324320 
7. 


E 
719958400 
13305600 

11200 
13305600 


— 1995849 
19958. 
72993 
— 39916800 
app 


27937600 


- 15 6720 
33950069 


31933440 
731933440 


— 47900169 
-19160064 


25 
25 


35 


Table 5.12 (cont.) 


аа, a.a, ааа 
16632 665280 332640 
-349272 ~ — 5322240 
оро е або 
un a 
73742200 б 
665280 26611200 13305600 
—7983360 —79833600 ~5 
23950080 3991 
—26611200 
0872800 26611200 13305600 
719958400 .- 13305600 
Li H 
Bree 
—997920 - 59875200 —29937609 
10977120 59875200 59875200 
Е + —29937600 
E 
—19958400 
14968800 
714968800 
36 95800320 47900160 
Em + + —47900160 
1958400 . : 
—11975040 
1 20 
731933440 
15966720 
—23950080 
23950080 
9580032 
-50 
ее —399168оо 
` 39916800 
-6 12 
-6 24 
—6 36 
-6 18 
-6 зо 
-6 36 
-6 
ae at 
-6 30 
-12 72 
yay d . 
-7 14 
m— 28 
=F 21 
= 
35 - 5 
c = 3 
. = R, 
-$ 40 
8 m 
. -9 18 
. -9 36 
. -9 45 
. —10 зо 
710 —10 50 
. -1 
-13 -12 4 


a.a, as аа? ааа? адаа аа) 
166320 83160 110880 5 18480 
— 2827440 — 1490880 —1995840  —10 T5 — 388080 
12640320 33880 9979200 5987520 2661120 
—19958400 E ы. ee —13305 - те 
12474000 1621 4 1081 914' 
pu c one ee 
6652800 — 332040 4656060 232 813120 
Er — 199: — 2993 — 17297280  —7983360 
569600 3991 3991 26611200 
— 19958400 —33264000 —3326400 —26611200 — 31046400 
. 9979200 . 1663200 332! 
6652800 3326400 13305600 6652800 3769920 
300 —9979200 —39916800 —26611200 — 23284800 
52800 . 1995] 395 
-33 . — 1108800 
8870400 4435200 5913600 
А + — 4435200 сия 
< К * 56800 
—14968800  —7484400 —9979200 -—9989600  —1663200 
44906400 — 29937600 39916800 24948000 11642400 
"Moodle E —29937600 —34927200 —24948000 
got 299; + 14968800 14968800 
` . —% 6 . 
. —19958400 -—9979200 — 6652800 
erp 1995 26611200 
+  —9979200 “5 
. = 6652800 
6652800 
23950080 r 1975040 15966720 7983360 2661120 
47900160 —35925120 —47900160 —31933 P — 15966720 
23950080 35023120 . 2395 23950080 
. 31933449 ана 10644480 
+ —15966720 — 31933440 
10644480 
—19958400 — 9079200 —1330$600 —6652800  —2217600 
39916800 29937600 39916800 26611200 13305600 
219958400 — 29937600 + 719958400 ~ 19958400 
-24 
—72 =. 
= 26611200 —13305600  —8870400 
-1 
13305600 26611200 
—36 -18 36 
— 8870400 
-54 —36 108 -54 
-12 . -4 5 . 
E poe $ ; 
—108 24 -72 144 -36 
~28 . ‹ 
М -21 . 
—42 -21 42 
-14 -28 К 
—3$ -35 35 
-16 : 3 $ 
—64 32 . 
= Д = 24 24 
m 16 -32 32 
. * ag . . 
—36 . 22 18 . 
هو‎ . =3 8r m f 
-10 Д —20 z К 
—70 20 -20 40 . 
“3 . -f 22 . 
-72 12 - 72 -12 


—a6611200| 
39916800) | 


Table 5.12 (cont.) 


m 
gia (vi) | ааа ааа a СЕУ ааа аага, aaa," LIII mam ва, LIN 
E— - y wet. 
j 1)” | 13860 7520 880 
| | —3040%0 S šis -2772 = 1995840 bs 33440 LL - io -1940 -t айа Е. 
> Bray 2207740 1184240 291060 9979200 ss 3516480 TEE 261 
ada" 648 EC sho —1275120 —9979200 — —7983360 -6652800 — à ~31 =1164240 ~ 
aM (| 76923300 — 4573 * Т зно йо — 582120о 3742200 2079000 1247400 
ay (ur “yas —1247400 — 1247400 » dam . 2494800 1247400 — 1247400 x | 
3 . . . . . * . . . 
[OTIO 3 73920 33 1663200 1140480 308880 71280 13 | 
Ке е 1 E a unis mi onm даа - 
8 a)" б »* 24284800 1 52000 6232 19958400 21931600 - 11200 266! 1200 1 12908800 эе E 
* (1)* | -23284800 —16632000 — 11088000 9979200 — -13305600 — - - . 
[DUE 30 E [D se 35 À2 — 
1 )* (1)* 4989600 4656960 1552320 26611200 13305600 $2800 6652800 26400 33 1330560 13305600 
F e 21621600 —19958400 — 200 713305600 - 13308600 —26611200 —15b32000 —19958400  —9979200 713308000 
р 2)#(ї)# 18295200 32000 16032000 a 52800 6652800 — 16632000 9979200 13305600 
DR J) | —1663200 . = 1108800 . а . . =—33216400 ۰ + mee . 
P()'| 4435200 4435200 56800 - Е 1 $870400 4435200 8870400 4435200 0| 
Fon - 4435200 —4435200 эе À A : . ^ —4435200 . ^ 4435200 A 
F Eoo - 1330560 — 665286 — 166320 - 19958400 —9979200 — 4989600 -3326400 —1663200 —8s5360 - 190080 — 4080600 
) (2) (1)* 6o бо 2661120 200 16800 24948000 1995 11642400 7983360 2661120 299; 
ш 3) (1)! — 24948000 ia? ӨН -3979200 ae > 39037600 ш Се -2993 -34 = 199 —9979200 8069 
8 RA 19958400 E 1 1 9979200 9979200 ^ 
2)!| —49Bo! y К М ў ч 
) i= боо — 166: 668800 — 6800 — 8400 -= - -6652800 ~ —$322240 — 39916800 
gun 33204 pee Ea rL TE 26611209 39016800 iate н 
1) | =1 00 200 —1 . 79979200 .7 - - Ы 
о, 1725400 26879800 фы P . —13305600 "tss 10958400 — 1330! > 
2 26400 E . 2800 . 2800 . 
| (*| 7 14968800 7484400 . - 4989600 408060 14068800 
о —14968800 — 14968800 — 14968800 › ^ ^ 2 . —14968800 — 14968800 Z 
4 ji e UPS ‘ g Р 3 ^ е , 99792100 9979200 E 
| 2 266. 60 950080 11975049 7983360 3991680 199: б 418176 15966720 
2j e Ate | _ омо - е Bet - о6о 43955160 — 35925120 — 3193, —19958400 ~1 —4790016 —479001 
Es 2| 1798 2560 17962560 11975 23950080 зано 239! por r es 11905040 " 
1 5987520 ios — 3991 - = - = s 
y| t MEOS . 15966720 жо 1 720 9313920 31933449 
frs - 15966720 31933440 31933440 — 15966720 15960720 -t —23950080 . 
JN ; . + 7983360 . nne . 
(1 . 5322240 10644480 . . . 2240 * 
—11975040 —35925120 — 23950080 . . . . тарам —1$966720 К 
. . . . 97 239 . 
E) 7 с ee 5 ex E E 
Po: 19160064 19160064 : х 5 5 y 9580032 » 
BG - 1663200 665289 —221760 - 79833600 - 39916800 —199s&4oó —13305600 — 6052800 unies E E 
6 6800 3991 3991 11200 19958400 
e a r Исе Ж суз л —1995! . —19958400 —9979200 -—9979200 . 
4989600 26400 . . . * 
М (t "|- 600 — 600 -$ 70400 : E & -26611200 ~ 13305600 -26611200 — 13305600 ۴ 
) (3) AG ; ) 13305600 13305600 25011200 ; 5 13305600 : 13305600 , 
© | 9070200 19958400 19958400 : > «19958400 19958409 : 
(6) (4) (2) reum — 19958400 К 
Tm —31933440 . ` + — 15966720 . 
(6) (5) (x) 5966720 СУУ А 
p 72 72 5 28 | 68428800 
б n $1200 570240 
(° Seo 34214400 17107200 11404800 5702400 
^ -3421 —34214400 —34214400 —22809600 —17107200 — 5702400 à 
(7) (а) (1% » T I РР 
^ x . 17107200 553 553 
DeO PE 
> Ё 22809600 11404800 22809600 11404800 Б 
(OKOKON 7 ET TTNA ee 
z = 11404800 . = 114048 
(7) (3) (2) 7 -35 42 at Ta 
н - 17107200 — 17107200 " 
(7) (4) (1) Р А 238 16 28 = $ j 
(7) (5) = 5 -35 735 35 
-35 7 35 35 3 z 
(8) (x) 8 s Ө > 1 3 
8 А i 8 —16 т . . 3 
" 0 $ 2 P i = 3 PA Е Ў xi 
8 ini -32 > 8 oe 16 32 -32 К -8 
. 10, 
) (1)? E А 9 -9 . =9 
om а 9 -2 5 E . 20 
= E 27 -27 . 
n Е i 2 => 10 " . RT 


Table 5.12 (cont.) 


T— 
1 —12(vii).| ааа? аа? аад ва, аа? ваа, аа, аа! ава, аа Aa] 
1 
1)* 5940 2970 1980 5 1320 660 220 132 66 12 
2) te —172260 —89100 — 61380 — 16830 —47520 —24420 —8580 - Sogo — 3036 — 660 -á 
2 х 1 ^ 1413720 ا‎ 582120 188595 259249 1 831 44550 11880 
ay (| —3742200 —257 —2079000 —-8731 —1663200  —1081 -5 = 415800 - 249480 —83160 -1 
ax (1) 3118500 30350 2702700 1611225 1247400 1455300 1039500 23700 519750 207900 1975 
a2) (1) | —623700 —1871100 — —623700 —935550 +  —623700 – 623700 DM TATUR gO -b 
ay . 311850 А 155925 . „ = 62370 : 10395 
@ ) mn 665280 33: 225720 59400 221760 110880 $7499 ule 3960 
BAS 2| =. 18080 —3991 - 13760 —997920  — 3326400 SH cse — 681120 -3 o — 110880 = Д 
(3) (2)* (1 16632000 11975040 108r 5155920 9979200 52800 3492720 ipm 1995840 831600 1 


(3) (2)° (1? | —9979200 — 13305600 —11 - 83160060 — 3326400  —6652800 -—6652800 —3326400 =3 —1663200 + | 
3) (2)* 0) 2 vere 2910600 ; 603. 2079000 3 68 415800 pt | 
? 1)* 6652800 33: 1120 997920 4435200 Brie 813120 1108800 но 221760 3 
S 2) (1)* 13305000 —9979200 — 13305600  —8316000 — 13305600 E эро — 5544000  —6652800 —3 —2217600 picos) 
(з)* (2)! (1)* 52800 9979200 13305600 11042400 ` 2 9979200 3326400 4989600 3326400 16632001; 
зу, (ah +  —3326400 .  —1663200 » — 1108800 . -—1663200 ` 3 — 554400 
А 4 4435200 4435200 2956800 1478400 1971200 2956800 1478400 1478400 492800 
(3) md . +  —4435200 -—4435200 . = 1478400 75913600 + -—1478400 —147840о  —1478400 
m 3 *| —2494800 —1247400 —831600  —210870  —997920  -—498960  — 0795 — 166320 -83160 — 23760 — 2970 || 
P ү *| 17463600 7484300 2577960 9979200 5488560 2162160 Soo 1330560 498960 83160 
4) (2); (1)! | - 22453200 — 9958400 = 17463! —9355500 E 712474000 — 7484400 -7484 n $обоо —2494800 623700 
d | | Я A 7484400 19908 7484400 7 1800 7484400 7484400 7404800 4989600 2494800 1247400] 
+  —3742200 = 2182950 y 8 . —1247400 . - 95 
| 7 19958400 —9979200  —8316000  —36 onto — 19958400 —9979200 —3659040 —6652800 -—3326400 — 1663200 = 332640 
og ү 19958400 1958400 20937600 23 19958400 1995; 16632000 19958400 133056000 9979200 332640 
i + -—9979200 — =з Воо —9979200 49 Soo - 9979200 -# 9600  — 4989600]! 
ф ( ) (1) fi . —13305600 — 16632000 а . —6652800 — 6652800 — 3326400 652800 — 3326400 
4) )* (2) b x - 3326400 4, к 6652800 Ж 3326400 i 3326400 
(4)! ()*| 7484400 3742200 2494800 1871100 14968800 7484400 2494800 7484400 3742200 2494800 23708 | 
с" Hd зу — 7484400 — 7484400 — 7484400 —11226600 + —7484400 —748440о — 7484400 —7484400 ~ 7484400 4) 
(4 * 3742200 ? ES ^ ‘ К i 3742200 Ў 1871100 
(4) э » 5 4989600 К ы 4989боо т р 4989600 ns обо 
. * -— . . . 1 ? LS 
est 7983360 3991680 2661120 045280 3991680 1995840 665280 798398 399168 133056 19008! 
73193 —19958400 — 15966720 3087 P —23950080 —13970880 — 5987520  — 79833 —4390848 - 1995830 7399188 
ү ) H 2395: "iste 23950080 11975040 Merced 13970880 11975040 9979200 987520 Hey 
‚ 711975 а 520 = ۰ 78987520 — 1095840 -~ ч 
15966720 3бо 10644480 31933440 33060740 белбоо 15966720 7983360 5322240 1330500 
(5) e fo: —1$966720 — 15966720 —31933440 886 . —15966720 —23950080 — 15066720 —15960720 —15966720 — 7983360 


3) (2)* 
[d И | a 10644480 1 5 : У 5322240 532: 
ea —39916080 —23950080 —11975040 —3991 —23950080 —11975040 —11975040 — 3991080 


oon поо — чс зек r EE 


9580032 4790016 9580032 4790010 


7983360 z 3 es » t 3991680 T 7983360 б 3991680 
б он 1 . о 


Nr G ; 66528 66 : © 66528 600 TEE een Res ES 
71995 79979200  —6652800 ~1663200 ~ 1330 — 6652800  —2217! — 3321 3200  —6652! oo = 4 

9 D. 3991 Preces 26611200 ЕТ 39010805 26611200 13305600 Pigs 1164 agos 6652800 1663200 
es i 19958400 —29937600 —19958400 —1 + —19958400 —19958400  —9979200 —14968 - 9979200 ~ 4989600) 
B 9979200 . . 4989600 ^ е * 4989600 " 16632000 

(6 96 " » * « 713305600 —13305600 —26611200 —13305600  —8870400 — 26611200 —13305600 —13305600  —443520 

(6) (3) (2) (1 Sy » . 13305600 13305600 . 13305600 26611200 . 13305600 13305600 1330560 
. v Ў, а » + 8870400 - Р = 443529 

eu 3 H A 9979200 © p . 19958400 9979200 19958400 9979208) 
@ ) e d ý . =9979200 A e ; + —9979200 9979209 

5. . . 3 а Š ° * s є. — 15966720 тї doo 

. sa х 


Bee 34214400 17107200 19404800 2851200 34214400 17107300 _ 703400 11404800 pago 2851200 


734214400 —34214400 ~ 34214400 —17107200 ~ 34214400 — 34214400 — 34214400 — 17107200 
7) (2)* € . . 8553600 . " 85536 
Tn DOMUM aee Шш 1 1777 Шз ael шию salones 
; 1 » M ^ —17107200 E 4 . A ә 27197399 
E Ме aes s N А К i M TER ЖОШ | 
(8) (2) a 50927609 Мно 29937600 1408800 nue B ied 29937600 о тозы -2070305 || 
1 Q 
(8) (a)* i ae . = 7484400 : . : . — 14968800 . c8 
(8) (3) (1) Я mcn 719958400 . + 719958400 7 + 19988400 ~ 19958408 
vo 32 -16 -32 = i y j j з | АШ | 
(9) ()* - cM 26611200 8870400 53222400 26611200 26611200 8870 Í Í 
(9) (2) (1) M a К asune —26611200 - . —a6611200 —26611200 ~ 26611 "| 
(9) (9) i 2 А 27740800, . E DE 
9 27 А -27 r 9 -27 27 t. 
Я 47900160 2595080 47900160 — 23950080 
a9) GY 10 б > " 10 . . -10 soll 
s 23950080 . 239500 
(10) (2) 30 -20 с : 10 -20 y -10 20 ! 
43545600 43545 
(1) Q) 22 - =i " 11 war . =j B 11 " 
ЭЖ 4 


(та) 36 -1a -24 ‚ ла 2 -24 12 -12 12 1а -h 


[ 243 ] 


CONTROL CHARTS WITH WARNING LINES 


By E. 8. PAGE 
Statistical Laboratory, University of Cambridge* 


Process inspection schemes using control charts with warning lines are considered and the properties 
of some schemes based on the observations from the last few samples are evaluated. Tables of schemes 
for controlling the mean of a normal population are given. A method is suggested for controlling both 
the mean and standard deviation of a population on a single chart; examples are given when the 
population distribution is normal. 


1, INTRODUCTION 


A problem that arises in industrial applications of statistics is to detect changes in the para- 
meters specifying the quality of the output from a continuous production process, so that 
some rectifying action can be taken to restore the parameters to satisfactory values. A widely 
used scheme for this purpose (a process inspection scheme) is based on a control chart 
(Shewhart, 1931). Samples of fixed size are taken at regular intervals and a statistic of the 
sample (for example, the mean, the range, or the number of defectives) is plotted on the 
chart; if the sample point falls outside control limit(s) drawn on the chart, i.e. if the statistic 
differs by more than a given amount from its satisfactory value, rectifying action is taken. 
This scheme will be referred to as a single-sample scheme, since the decision whether or not 
to take action is based upon a single point on the chart; also in what follows the lines on the 
chart denoting a serious departure of the statistic from its satisfactory value will be called 
action lines instead of control limits. With any process inspection scheme there willin general 
be a delay between the point at which the parameters changed and that at which the scheme 
demands action. During this delay the production fails to meet the quality requirements. 
On the other hand, in practical cases, process inspection schemes, in particular single-sample 
schemes, cause action to be taken even when the parameters constantly maintain satis- 
factory values. In the one case there is а loss due to the production of substandard material, 
and in the other due to unnecessary interference with the production process. These losses, 
as a function of the parameters, may be used as a basis for the selection of process inspection 
schemes or for the comparison of two schemes. When the fraction of the production that 
is inspected is constant, a convenient assessment of these losses can be obtained from 
the average number of articles that are inspected before the scheme requires rectifying 
action when the parameters remain constant; this number, a function of the parameter 
values, has been called the average run length (Page, 1954a; cf. Aroian & Levene, 1950). 
We shall base the choice of inspection schemes on the behaviour of their average run length 
functions. 

Consider a single-sample control chart scheme for controlling the mean of a normal 
distribution. Suppose that the standard deviation, с, of the process is known, and that the 
‘ideal’ value for the process mean is и. Then samples of size N are examined at regular in- 
tervals, and if the mean, z, falls outside the action lines drawn at j + B,c]4.N, rectifying 
action is taken. The current practice is to choose N, usually small, and to use ‘three-sigma’ 


* Present address: University of Durham. 


244 Control charts with warning lines 


limits, i.e. to take В, = 3. These action lines are chosen so that the chance is about 0-002 
that a given sample point will cause action to be taken when the mean is at its satisfactory 
value x. This consideration gives no guidance about the best sample size to choose nor any 
information about the average amount of time that will elapse before a change in the mean 


is noticed. If the fraction of production that may be inspected is fixed, for example, by a ' 


limitation on thé man-hours for inspection, it is more realistic to choose the scheme with 
the most suitable average run length function. Tables giving values of N and B, have been 
constructed to enable this choice to be made easily for controlling the mean of a normal 
distribution (Page, 19545), and similar tables could be computed for other situations. It 
turns out that the best sample size to use is often much larger than those customarily taken 
in industrial practice, so that samples should be taken less frequently. But although the 
theoretically best sample size is large it may not be practically convenient to take samples 
of this size; further, the quality control engineer has got used to small samples and he may 
be reluctant to change his habits radically. There is, too, his (correct) feeling that a small 
sample will spot a very serious change in the mean, and for this reason he will wish to continue 
taking small samples frequently. It is therefore of interest to seek schemes of the control 
chart type that are easy to apply, that require only small samples to be taken, and yet retain 
the advantages of the single-sample schemes using large samples. One method is suggested 
by а modification of the single-sample scheme that has occasionally been used (Dudding & 
Jennett, 1944); in this a cluster of ‘moderately’ extreme sample points is treated as a single 
point outside the action lines and accordingly action is taken. A point is adjudged ‘moder- 
ately’ extreme if it lies between warning lines and the action lines.* Thus for controlling the 
mean of a normal population with known standard deviation, с, at an ideal value и warning 
lines would be drawn at w + B,o/./N, where В, < В. 
We shall consider rules of the following type: 


l. Choose k, n, N. Take samples of size N. Take action if any point falls outside the action 
lines or if amy k out of the last n points fall outside the warning lines. 


In the next section we show that in a certain sense it is reasonable to consider only two 
special cases of rule I, and in the following section their properties are evaluated. 


2. RESTRICTION OF THE RULES 


Tn the rules of type I only the region of the chart in which a sample point falls is taken into 
account when deciding whether or not to take action; the position of the sample point within, 
say, the warning region is not considered. For the rules it is only necessary to count the 
number of points in the various regions; however, for mathematical convenience it is useful 
to suppose that a score is assigned to each point according to the region in which it falls. 
A point outside the warning or action lines is obtained when the quality of the sample falls 
below the required level; if a mark or score is to be assigned to the sample such a defection 
will merit some penalty. On the other hand, a point within the warning lines will receive 
а bonus score. It is reasonable to base a process inspection scheme on these scores; an 
accumulation of penalties will be taken to indicate a deterioration in quality and suitable 
rectifying action will be taken. After each sample is taken the total penalty score over the 
last ‘few’ samples is examined and action is taken if it is ‘large’. In order to make the scheme 
precise, let a score x; be assigned to the ith sample, where 2; =—a,b,c (а> 0,0» 5» 0), 
* A chart using only warning lines has been cónsidered by Weiler (1953). 


E. 8. Pace 245 


according as the sample point falls within the warning lines, between the warning and action 
lines, or outside the action lines, respectively. The scheme is: 
Il. Take action after the n-th sample if any of the inequalities 


X Byerley (r = 0,1,...,8) 
i-o 


are satisfied, where s and the h, are suitably chosen constants. 


We consider the case where з is limited only by the number of samples drawn since action 
was last taken, so that all the partial sums, working back from the last sample, are examined ; 
and where h, = h, all r, so that the total scores in the last one, two, three, ..., samples are 
tested to see that none is greater than A. In an earlier paper (Page, 1954a) it was shown that 
this scheme is equivalent to a sequence of linear sequential tests (Barnard, 1946; Wald, 
1947) with initial score on the acceptance boundary; if the test ends on the acceptance 
boundary the test is reapplied, while if it ends on the rejection boundary action is taken. 
Consequently we consider what restrictions must be placed on k and n in order that the 
scheme I shall be equivalent to the repeated application of a linear sequential {ез} with 
boundaries at (0, л) and initial score zero.* 

First, we have that an initial sequence of (k — 1) points between warning and action lines 
has total penalty score (k—1)); since this sequence does not require action to be taken 


(k—1)b « h. (1) 
If now this sequence is followed by (n — k) points within the warning lines each receiving 


a bonus score —a, and then by another point between the warning and action lines, scheme I 
requires action to be taken. Hence the total penalty score must be at least h, i.e. 


kb — (n —k)a zh. (2) 
Further, any sequence of (n — k+ 1) bonus points (i.e. between the warning lines) is equi- 


valent to restarting the scheme; in particular, the set consisting of (k—1) points between 
the warning and action lines followed by (n — k + 1) bonus points must have a total penalty 


score at most zero. Hence (k-1)b—(n-k+1)a<0; (3) 


indeed, if this inequality were not satisfied a finite number of such sets of n points would 
cause the total penalty score to exceed h and action to be taken, contrary to the conditions 1. 


The combination of (1) and (2) gives bla» nk, (4) 

and hence with (3) we obtain (n—k) <(n—k+ JE 1), (5) 
since Ё > 2 for the warning lines to be distinct from the action lines. It follows that 

C- r L ^ 

n< y-2 = +9 (6) 


For scheme I, k and п are integral and 2 < k <n, во that (6) can be satisfied only for 
(i) k=2, any”, 
or (ii) k=n. 


* The restriction on the initial score is unnecessary ; if the initial score is Z (0<Z<h), inequalities 
are obtained in a similar way and found to imply Z —0. j 


246 Control charts with warning lines 


The two rules we consider therefore are: 


III. Choose n, N. Take samples of size N. Take action if two points out of any sequence of 
n fall between the warning and action lines or if any point falls outside the action lines. 

IV. Choose n, N. Take samples of size N. Take action if n consecutive points fall between the 
warning and action lines or if any point falls outside the action lines. 


In these two cases it cannot happen that some sets of points not requiring action to be 
taken have greater total penalty scores than some sets leading to action; for any other 
values of k, n the anomalous position can, however, occur. The above theory admits the 
corollary that no other types of warning lines, more or less extreme than those considered 
above, may be introduced and yet permit the equivalence of a rule of type 1 based on the 
last » points, and a rule of type II. In the next section we derive the average run lengths of 
the schemes III and IV. 


3. THE AVERAGE RUN LENGTHS OF THE RULES 


Althoügh we shall not use the following method, for completeness we remark that the 
average run length of the general rule of type I may be evaluated by enumerating the 
possible combinations of the last (n — 1) points on the chart such that action has not been 
required, and treating the combinations as the states of a discrete Markov chain (e.g. 
Bartlett, 1953; Feller, 1950). The matrix of transition probabilities, P, may be written down; 
the vector giving the probability that the chart is in a given state after the rth point is 
р, = Р'р,, where p, is the vector specifying the initial state. Accordingly, the probability 
that action has not been taken up to and including the rth sample is the sum of the elements 
of p,, i.e. l'p,, where l’ = (1, 1, ..., 1). Hence the probability that action is taken immediately 
after the rth sample is 1(p,_, — p,), so that the average run length is given by 


L=V> r(p,- — D) N, 
r=1 


ie. L-Y(I- P) p,N, (7) 


where N is the size of the sample. 

In the general case the size of the matrix P increases rapidly with n so that the labour 
involved in inverting 1— P is considerable. Fortunately, the transition matrices for the rules 
Ш and IV are of a simple type and the states can be more simply specified so that the 
average run lengths can be found with little trouble. Let the probabilities that a given point 
falls between the warning lines, between the warning and action lines, and outside the action 
lines be ру, Pı, p, respectively. Suppose that a suitable convention is made for points falling 
on the lines; for example, that any point falling on a line is to be regarded as falling into the 
adjacent more extreme region of the chart. With such a convention, clearly po + Pı + Pa = 1. 

Consider first rule III; it has been shown that this rule is equivalent to a sequence of 
linear sequential tests with initial score on the acceptance boundary. By inspection it is 
seen that appropriate scores to assign to the sample point are given by 


a=1, b=n-l, с=п, (8) 


and action is to be taken if the total penalty score rises a height h = п above its previous 
least value; or equivalently if the sequential test ends on the rejection boundary given by 


> 


 ———— "—— —— 


E. S. PAGE 241 
h =n. When action is not required the position of the total penalty score relative to its 
previous minimum or of the cumulative score in the test is specified by one of the integers 
0,1, ...,n— 1. These may be regarded as the states of a Markov chain and the transition 
matrix P is seen to be 


Po 0 e 0 nm 
г eee a. Oo 

Plo. 220 оо, (9) 
eye i D ; 


'The average run length is then given by equation (7). Alternatively, let L; be the average 
numberofobservations drawn before action is taken when the state is i. Then L,is the average 
run length of rule III. We have, by taking expectations conditional upon the result of the 


first sample, L= polN + Lo) +p,(N+Lp_-1) 
= N+polot+ Ply (10) 
Similarly, we obtain the set of equations 
L= N +poLo+PıZa-1 
L,=N+polo 
Li = N+ poli- 


The last n— 1 of these equations may be solved successively for Lı, ..., Б-у in terms of L,, 
and the first equation used to evaluate Lo. We obtain the average run length of rule III: 


(1&i «n— 1). (11) 


(1 —р+ру—2\Рё DN 

EOE o n > 2). 12 

m Ате a vir 
Scores for the various positions ofthe sample point on the chart for the repeated sequential 

tests equivalent to rule ТУ can be taken to be a — n— 1,b=l,c=nandh=n. There are 

again only n possible states before action is taken for the Markov chain. The average run 

length is Lo, where Го is given by the equations 


L= 


L= моды (0<i<k-1). (13) 
La = N Polo 
We obtain the average run length of rule IV: 
Læ ENE (14) 


1 —ру—Ру+ Popi 


4, CHOICE OF SCHEME 


Ш and IV, there are certain constants that may be chosen 


In each of the two types of rule, 
to give the scheme adopted some desired properties. Consider a two-sided chart with two 


sets of warning and action lines symmetrically placed about the ideal value for theparameter, 
or aone-sided chart with one warning and one action line. The disposable constants are then 
the sample size, N, the number of sample points considered, n, and the positions of the 
warning and action lines; the scheme may also be of one of two types. This may be con- 
trasted with the single-sample control chart scheme in which there are only two disposable 


248 Control charts with warning lines 


constants, the sample size and the position of the action lines. In the single-sample case 
the scheme could be chosen to give a specified average run length for some unsatisfactory 
value of the parameter and maximum average run length for the ideal value (Page, 19545); 
that is to say, the amount of scrap at a certain value of the parameter that can be tolerated 
is stated and the scheme is chosen to satisfy this requirement and to give the longest run 
without action being taken when the quality is satisfactory. A corresponding choice for 
a scheme with warning lines could be made by selecting two unsatisfactory values of the 
parameter and the average run lengths that can be tolerated at these values and then 
choosing the disposable constants to gain maximum average run length on the ideal quality; 
in this way the average run length will have desirable properties over a range of the para- 
meter. In order to illustrate this we consider schemes for controlling the mean of a normal 
population with known variance, с?, for two-sided deviations from the ideal mean. Suppose 
that it is desired to obtain a scheme with average run lengths of 100 and 25 when there are 
departures from the ideal mean of amounts 0-4c and 0-8c respectively. The large number of 
disposable constants would make it laborious to find the scheme of the types considered 
which has the greatest run length on ideal quality, but one scheme with approximately the 
run lengths stated above is that of type IV with Ё = 4, sample size N = 20, warning lines 
at д + 1-50/,/N, and action lines at j + 2-8750|4.N. We can compare this scheme with the 
best single-sample schemes for detecting the two sizes of departure from the mean. From 
the tables given in an earlier paper (Page, 19545, Tables 2a—c) the single-sample scheme 
having average run length 100 when the mean is и + 0-4c and maximum average run length 
when the mean is £ has action lines drawn at ш + 2-820/,/N, and sample size N = 70. Simi- 
larly, the single-sample scheme with action lines at и + 2-830/,N and sample size N = 17 
has L = 25 at mean £ + 0-8c and maximum average run length at mean y. The average run 
lengths for the three schemes for several values of A, where the mean is # + Ас, are shown in 
Table 1. It is seen that the rule of type IV has approximately the same run length as the 
single-sample scheme with sample size 17 for large deviations ( > 0:87) from the true mean; 
on the other hand, its run length for smaller deviations (0-2 « | A | < 0-4) is rather less than 
that of the single-sample scheme. Again it is seen that, while the type IV scheme compares 
unfavourably with the second single-sample scheme (N = 70, В, = 2-82) in its behaviour 


Table 1. Average run lengths of control chart schemes 


Single-sample schemes Type IV scheme 
А n=k=3 
N=17 N=70 N=20 
B,=2-83 B,=2-82 В, =2:875 
B,-165 
E = 4 
0-0 3652 14600 3368 
0-2 780 560 545 
0:4 146 100 99 
0-6 47 71 42 
0-8 25 70 26 
1-0 19 70 21 
J 


E. S. PAGE 249 


on good quality, it gains considerably for the large departures (| A | > 0-4); clearly for a 
single-sample scheme to cause action to be taken at least one sample point must be plotted. 
The situation we are considering is where an amount N /f of production is output between 
visits from the inspector (where f is the fraction of output sampled); thus the relevant 
quantity to consider for even extreme deviations from the mean is the average run length, 
or approximately the sample size. 1 

The choice of the best sample size could be examined in the same way as in the earlier 
paper, but it would be extremely laborious to do so. In general, there will exist a value of 
N giving maximum average run length when the production has the ideal quality and 
specified average run lengths at two values of the mean. The effects of different sample 
sizes are illustrated by examples of rules of type IV which have average run lengths of 100 
and 25 at и + 0:40, д + 0-80 respectively; the sample size, N, the positions of the warning and 
action lines, B, and В,, the number, n, of consecutive samples considered, and the average 
run length, L, on the ideal mean are shown for five schemes in Table 2. There is, of course, 
an upper limit to the size of the sample that can be used for the scheme to have the desired 
run lengths. It is, however, again preferable to use larger samples if this is practically 


possible. 
Table 2. Average run lengths of type I V schemes 


N n B, B, 14080) |  L(04c) 


5 3 3:25 1-25 25-3 118-0 584-6 
10 3 3:125 1:25 25-2 103-9 1096-9 
15 3 3-00 1:375 25-6 104.2 2286-4 
20 3 2-75 1-75 25-0 101-6 3155-9 
23 3 2:625 " 25-9 92-3 


5, THE TABLES 


In the Appendix are given tables of the average run lengths of schemes for controlling the 
mean of a normal population with known standard deviation. The run lengths are tabulated 
for some schemes of type IV with sample sizes 5, 10, 15 and 20, taking into account the means 
of the last three or four samples drawn (n = 3 or n = 4). In order to use the tables to deter- 
mine the suitable control chart scheme to be used it is necessary to decide first upon the 
average run lengths that can be permitted at two process means different from the ideal 
one, and which of the above sample sizes it is most convenient to use. А scheme approxi- 
mately satisfying these requirements may then be selected by inspection of the tables, 
possibly using some rough interpolation. The schemes shown in Tables 1 and 2 serve as 
examples of the method. 

Tt has been remarked that the use of moderately sized samples improves the average run- 
length function; consequently such samples are to be preferred where possible. For samples 
of ten and over the average run length for departures from the ideal mean of more than а 
standard deviation is near the sample size; this is easily seen since the probability that a 
point falls outside the action lines is nearly unity. 

Similar tables could be constructed for the average run lengths of schemes of type III. 
The two types of scheme are, of course, identical when the number of sample points 


250 Control charts with warning lines 1 


considered is two. A few calculations indicate that the two types of scheme have very 
properties and that, corresponding to a given scheme of one type, there exist schemes 
the other type with approximately the same average run lengths. £ 


6. SIMULTANEOUS CONTROL OF THE MEAN AND STANDARD DEVIATION 


It is often required to control simultaneously both the mean and the standard deviation 
of a normal population. For example, when goods are being manufactured to a specification — 
a change in either the process mean or standard deviation would alter the fraction of - 
defective articles produced; one process inspection scheme for this situation (Jennett & 
Welch, 1939) is based upon the ratio (U —z)/s, where U is an upper tolerance limit, and 
z, s* sample estimates of mean and variance. Another possibility is to use the easily cal- 
culated estimate of the standard deviation from the sample range, w, in place of s and so 
base the scheme upon (U — z)/w. The more usual procedure in practice is to keep two charts, 
one recording the means and the other the ranges, of the samples. Before setting up any of. 
these schemes it is important to consider both the type of rectifying action that will be 
taken and the changes that are likely to occur. If changes in the mean and standard devia- 
tion require different rectifying actions it is necessary that the inspection scheme indicate 
the type of change and not merely its existence. Again, the choice of scheme will be influenced 
by therelative frequency of occurrence and importance of the types of change. For example, 
if changes in the standard deviation are very infrequent there is little point in estimating т 
from each sample as would be necessary in Jennett & Welch's rule. The same considerations 
will hold to decide the average run lengths for various types of change that the scheme must 
achieve. These remarks lead us to consider the possibility of controlling both mean and 
standard deviation on a single chart for means when changes in the standard deviation are 
relatively rare or unimportant. The remainder of this paper is devoted to the development 
of such a scheme and to comparing it with a conventional scheme. 

The control chart scheme using charts for the mean and range are often of the conven- 
tional type with action lines drawn so that the probability that a sample point falls outside 
the action lines on a given chart when the parameters have their ideal values is an assigned 
amount, æ (a = 1/500 gives the conventional 3-sigma limits оп the mean chart). Alternatively 
and preferably, the action limits could be determined for each chart separately on the basis 
of the average run length as described above and elsewhere (Page, 19546), so that the losses 
incurred by the scheme may be estimated. This approach, of course, will not be very accurate 
for some values of the parameters since, for example, a change in the standard deviation 
affects the run length of both charts. Consider a single-sample scheme for both charts во 
that action is taken if the mean of any sample, z, falls outside the range (y — Be] 4N, 
£+ Во] AN), orif the range wis greater than v, т, where v, is suitably chosen. Since the mean 
and range of samples from a normal population are independently distributed with fre- 
quency elements f(z | p', a") dz, g(w | с") dw, where u’, g’ are the process mean and standard 
deviation respectively, the probability that a given sample does not cause action to be 
taken is q1, 95, where ; 


А B+BolvN _ y = 
=f Ii o)dz = 1— p (15) 
4—BolVN 


9 = [ве g')dw = 1-р». (16) _ 


E. S. Pace 251 


Clearly 4;, 4; are the probabilities that the mean and range respectively fall within the 
action lines. The run length of the charts together is distributed as the smaller of two in- 
dependent geometric variates with parameters рі, p4; accordingly, the combined run length, 
l, is a geometric variable with parameter 1 — 9145. 


P(l-rN] = (a) (12:09. (17) 
The average run length is thus І, = aS 35 (18) 
1-4 
or, in terms of the average run length of the charts separately, is 
Lb 
Le рут, (19) 


with an obvious notation. A numerical example is given in the next section. 

We turn now to investigate what control can be achieyed by the use of the mean chart 
only, thus avoiding the need to calculate and plot ranges. A possible scheme makes use of 
the warning lines in another way. Consider the rule: 

V. Choose n, N. Plot the means of samples of size N on a chart on which are drawn two 
warning and two action lines. Take action if: 

(i) any point falls outside the action lines, 
or (ii) n consecutive points fall outside the warning lines, 
or (iii) two out of any set of n consecutive points fall outside opposite warning lines, 


A sequence of л points outside a warning line is evidence that the process mean has moved 
in the corresponding direction; similarly, the occurrence of the means of two near samples 
outside opposite warning lines points to an increase in the spread of the distribution. This 
scheme differs from one based on the range in that it depends on the variation between 
samples and not on that within samples. Such a scheme cannot be expected to be very 
sensitive in detecting increases in the standard deviation and will become less sensitive as 
the sample size increases; however, it may serve to keep a check on the standard deviation 
when control of the mean is of prime importance. In order to evaluate the average run 
length, Lọ, of rule V let Li(L;) i = l,...,n— 1, be the average further number of articles 
drawn before action is taken when the last i sample points have fallen between the upper 
(lower) warning and action lines. : 

Let the probabilities that a sample point falls between the upper (lower) warning and 
action lines be r(s). Then in the notation of $3 we have r+s = pi. By considering expecta- 
tions conditional upon the result of the first sample we have 


Ду polo t N)+1(Ly+N)+3(L{+N) +p. 
Therefore 4 Lo = Poly +rLi+sLi+N. (20) 


Again, by considering expectations conditional on the result of the next sample when 
the last û points have fallen between the upper warning and action lines, we obtain 


L; = polo Lin +N @ m 1,2,...,n— 2), (21) 
Lai = pola * N. (22) 


252 Control charts with warning lines 
A similar set of equations is obtained for the L7. Hence we have 
_ (N+ pela) (1 7773) 


Li i; ; (23) 
and similarly Ite rU e (24) 
These values substituted in (20) yield the solution 
(1—78 — 7" — 8" + 87% + ув") N (25) 


ч = pre + Po) +po(r™ +38" — sr” —тв")` 


Asimilar rule for simultaneously controlling the mean and standard deviation is provided 
by rule III without modification. If the two ‘warning’ points causing action to be taken 
are on the same side of the ideal mean a shift in the mean in that direction will be suspected, 
while if they are on opposite sides an increase in the standard deviation will be suspected. 
Of course with this rule, if a change in the standard deviation occurs and action is taken 
because of points outside the warning lines, the two points are just as likely to be on the 
same side of the ideal mean as they are to be on opposite sides. Again, action could be taken 
after a change in the standard deviation because of a point outside the action lines. Con- 
sequently the wrong sort of change is more likely to be suggested by the scheme when a - 
change in the standard deviation happens. This is of little importance if the rectifying action 
is independent of the suspected type of change. However, the correct inference is more 
likely to be drawn with rule V than with rule III. 

The rules of this section have nothing corresponding to the lower control limit for the 
range sometimes drawn in the conventional method. The purpose of such a line is to enable 
the oceurrences of samples with small range to be investigated in the hope that it will be 
possible to reduce the process standard deviation. It would, of course, be possible to 
introduce a condition calling for investigation if two successive samples both had means 
differing little from the required process mean; however, we shall not consider the con- 
sequences of such a complication. T 


7. NUMERICAL EXAMPLES OF SIMULTANEOUS CONTROL 


Consider schemes to control the mean and standard deviation of a normal population. 
Suppose that both mean and range charts with action lines only are kept so that the average 
run length, L, of the scheme is given by equations (18) and (19). If the action lines are chosen 
80 that the process inspection schemes formed by taking each of the charts on its own have 
equal average run lengths when both the process mean and standard deviation are at their 
ideal values, then the average run length of the rule using both charts with these parameter 
values is approximately half that of the separate rules. For different values of the mean 
while the standard deviation is the same, the run length of the range chart is unchanged; 
consequently the run length of the combination is a little less than that of the mean chart, 
and approaches it as the change in the mean increases. On the other hand, for different 
values of the standard deviation but the same mean, both L, and L, are changed. The 
average run lengths for a specific scheme with mean 4, and standard deviation с’ are shown 
in Tables 3 and 4. Let the action limits be drawn in the ‘conventional’ positions, i.e. во 
that the probability that a sample taken from ideal quality gives a point outside the action 


E. S. Pace 253 


line is 1/500 for each chart. Thus the lines on the mean chart are drawn at и + 3-090//N, and 
for samples of N = 5 the range chart has a line at w, = 5-240. We then have 


Гуш “yess x 
= —— e7! , 
я (-в-Аухмук N(27) (26) 


'w,/Ko 
ще [| mg) du. (27) 


For comparison, the average run lengths of one of the combined rules are shown in the 
tables; the scheme chosen has the same sample size, N = 5, and B, = 3:00, B, = 2:00, n = 3. 
With these values the combined rule has approximately the same average run length on 
the ideal quality as the scheme based upon both mean and range charts. It is seen that the 
rules gives fair control against changes in the standard deviation, and for changes in the 
mean has a smaller average run length than the rule using the two charts. Consequently 
when it is necessary only to keep an eye on the standard deviation while controlling the 
mean of a normal distribution a scheme of type V may be suitable. If so, the labour of 
keeping both mean and range charts can be avoided. 


Table 3. Average run lengths of schemes, N = 5, W = и+ Ас, 0  — 0 


Combined rule 


254 Control charts with warning lines 


A short table of schemes of type V is given in the Appendix, Table 2, for controlling a 
normal population with samples of five measurements, using at most a sequence of three 
such samples. The average run lengths for changes in the mean are shown in Table 2a, and 
for changes in the standard deviation in Table 20. 


І wish to thank Dr D. В. Cox for many helpful discussions on the subject of this paper, 
and the Director, Mathematica] Laboratory, Cambridge, for permission to use the EDSAC 
for the calculation of the tables. 


REFERENCES 


AROIAN, L. A. & Levens, Н. (1950). J. Amer. Statist. Ass. 45, 520. 

BARNARD, G. A. (1946). J.R. Statist. Soc. Suppl. 8, 1. 

BARTLETT, M. S. (1953). Proc. Camb. Phil. Soc. 49, 263. 

DuDDING, B. P. & Јеххетт, W. J. (1944). Quality Control Chart Technique. London : General Electric. 
FELLER, W. (1950). Probability Theory and its Application. New York: Wiley. 

JENNETT, W. J. & WErcH, B. L. (1939). J.R. Statist. Soc. Suppl. 6, 80. 

Расе, E. S. (1954a). Biometrika, 41, 100. 

Pace, E. S. (19540). J.R. Statist. Soc. B, 16, 131. 

SugwnanT, W. A. (1931). Economic Control of Quality of Manufactured Product. New York: Maemillan. 
WALD, А. (1947). Sequential Analysis. New York: Wiley. 

WEILER, Н. (1953). J. Amer. Statist. Ass. 48, 816. 


е Бы 
Tables of average run lengths of rules IV 
Charts with action lines at и + B,o//N, warning lines at и + B,o//N. 


Table la. N=5 


Е С. Ч ВЕЛЕ! E 
| | 
B,- 00 xp мв “loza 1598 е | 52; 11900 1002 1880 1850 00 
300 02 137 вәт 751 B6 | X6 593 7 572 887 02 
04 | в 102 о э эй 112 5 26 ә 280 04 
| | 06 32 43 51 15 90 4 62 79 93 100 06 
| 0-8 19 23 27 3 38 25 29 м 39 42 08 
10 | 14 15 16 18 20 16 17 19 20 21 10 
12 11 n 12 13 13 12 12 12 13 13 12 
| 14 9 9 9 9 9 8 8 8 9 9 14 
| 16 1 1 1 1 1 1 1 1 7 1 16 
18 6 6 6 6 6 6 6 6 6 6 16 
В, 0-0 208 549 1390 2051 2001 
34195 | 02 149 32 (63; 1008 1213 
04 6 ш 184 274 
06 33 45 63 81 110 
| 08 20 24 29 36 
| 10 14 16 17 19 22 
12 1 12 12 13 13 
14 9 9 9 9 9 
16 1 1 1 1 1 
18 6 6 6 6 6 
Be 0-0 213 585 
3.95 02 146 3M 
04 68 18 
0-6 35 48 
08 21 25 
10 15 16 
12 12 12 
14 9 9 
16 8 8 
1:8 6 6 


> 


oP 
©®&&о© 


256 Control charts with warning lines 


APPENDIX (cont.) 
Table 1с. N=15 


T 


E. S. PAGE 251 


APPENDIX (cont.) 
Tables of average run lengths of rules V 
Charts with action lines at и + B,o/J/N, warning lines at и + B,c| J/N. Sample size, N=5. 
Table 2a. ш = Ш+Ас, c = с. Table 25. и =p, o' = Ко. 


115 200 


Biom. 42 


[ 258 ] 


MISCELLANEA 


Approximations to the probability integral and certain percentage points of a 
multivariate analogue of Student's ¢-distribution* 


Bv C. W. DUNNETT} лхо M. SOBEL} 
Cornell University 


In a recent paper (Dunnett & Sobel, 1954), a multivariate analogue of Student's t-distribution was 
defined as the joint distribution of p variates t, = z,/s (ё = 1,2,..., p). Here the z; have a non-singular 
multivariate normal distribution with means 0, common unknown variance с? and known correlation 
matrix {py} and ns*/a* has a y3-distribution, independent of the z; with n degrees of freedom. The joint 
density of the t;, is given by 


ЛА»... 


-Ка+») 


AIT {Kn p) Г) + Bautan] Ў ‚ _ 


н) = uir Tn) 


where A is the determinant of the positive definite matrix {a,,} = {0}. In the authors’ previous paper 
expressions and tables for the probability integral and equi-co-ordinate percentage points of (1) were 
obtained for the bivariate case (p=2). In that paper an equi-co-ordinate P-percentage point was 
defined as the value of h for which 


fT [mie nate E: | (2) 


In this note we shall derive approximations (which are also lower bounds) to the probability integral 
of (1) applicable in special cases when p> 2. These results then can be used to obtain approximations 
(which are also upper bounds) to any equi-co-ordinate P-percentage point. Equation (11) below shows 
that the probability integral table (Table 1) of the previous paper can be used when p;, = } (3-7) to 
obtain numerically the approximations referred to above. 

Letting I = I „(hs hg, ..., hp) denote the left-hand member of (2) with the upper limit of t;„ replaced by 
h; (à = 1,2, ..., p), we have 


I-Pr(t, <hy, ...,t,, 4 b) = Рг «hs, ...,2,«h,5; n) (3) 
Fixing s as the last variable to be integrated, we can write 


Iz I "аа, hun (Pa) fede a) 


where @„ = G,(z,,...2,; (p,]) is the c.d.f. of the standardized p-variate normal distribution with 
correlation matrix {p,;}, and f,(s) is the probability density function of s with n degrees of freedom. 


ASSUMPTION 1. The matrix {p,,;} has the structure p;; = b,b, (7), where 0€ b, — 1 (i = 1,2,. p) 
It follows that {p,;} is positive definite, since the associated quadratic form x (1—52) 23+ (Eng? is 
positive for x; not all zero. 


ASSUMPTION 2. The upper limits of integration in (3) are non-negative, i ie.h,z0 (i = 1,2,...,p). 

We note that when p; = p20 (i+)), Assumption 1 is satisfied since this occurs when b, = Ap 
(i = 1,2,...,p). 

If we let Yor Yi Yp denote independent, normally distributed chance variables with zero means 
and unit variances, and let с; = (1 — 53), then the joint distribution of the chance variables 


2 = 06,9,—b,yp (= 12,...,p) 
is a standardized p-variate normal distribution with correlation matrix (pj). 


* This research was supported in part by the United States Air Force, through the Office of Scientific 
Research of the Air Research and Development Command. 

+ Now at Lederle Laboratories, Pearl River, N.Y. 

i Now at Bell Telephone Laboratories, Allentown, Pa. 


Miscellanea 259 
Consider the function G, in (4) above, hold ә fixed, and let а, = Aya (i= 1,2... p) Then 
G, = Pr(e,y,- hype. (i = 1,2,...,р)) 


| -[- Beet 9 


where gly) is the standard univariate normal density and Gly) is ita c.d.f. 
It can easily be shown (see, for example, Kimball, 1951) that for any ғ nondecreasing, bounded 
functions F(x) (4 = 1,2, ...,r) of a chance variable x we have (letting E denote expectation) 


ғ 
z| íi көг Й E(P 42). (в) 
isi t1 
Applying (6) to (5) gives 9,2 Й [ *6(«+*“®) уау, 
1J-- 


Б fi Pr (c,y,7 biyo «90 = ti Фа). e 
= | t1 
Substituting this result in (4) and applying (6) again gives 
Iz } T П амой A Gha) fala) da 
0 1 (2140 


= Йори меп) = ÍI Prita <d. (в) 
i=l 1 


This lower bound to Z does not depend on (pu) and is easily calculated from tables of the c.d.f. of the 


univariate Student t-distribution with n degrees of И 
Under Assumptions 1 and 2 а sharper (i.e. higher) lower bound can be obtained by obvious modifica- 
tions of the above argument. This lower bound on the c.d.f. of the bivariate t-distribution 


considered by Dunnett & Sobel (1954) which is tabulated there for the special case р = jand A, = hz 0, 
The results are for even pz 2, 
rz rete taste <a, el 


1) 
and, for odd p= 3, Iz Pese ch) T Pr (4, < о aii n < cal (10) 


If we replace Assumptions 1 and 2 above by 

Assumption l^. The matrix {р} has the structure Py = p (i5), where 0&p« 1. Clearly, {ру} із 
positive definite. 

Assumption 2’. The upper limits of in ion in (3) are all equal, i.e. hy = h(i = 1,2,...,р); 
then we can replace (9) and (10) by а single inequality and write for any integer pz 2 

Iz[Pr(t&, <<, (11) 

which is sharper than (10) (when р is odd). The proof of (11) is similar to the above proof, using, instead 
of (6), the well-known inequality (see, for example, Cramér, 1946, р. 176) 


pap for pzq, (12) 
wh denotes the absolute moment. em py m 
rwr 1959) su La an alternative method of obtaining an approximation and lower bound 
to (3) which is based on the Bonferroni inequality ; 
Pr (t, <№, -rton < E1— È Prtta> hd (13) 


i i i ions. However, when the inequalities (8) 
This method has the advantage that it requires no assumptions 
through (11) hold, then (8) and hence also (9), (10) and (11) give sharper lower bounds than ( 13). To show 


this for (8) let 1-4, = Pr {tin «hà» (14) 
so that 0 € q, € 1. Then we have to prove that 
дать (15) 
fio-o2:- žu 


This certainly holds for p = 1, and a straightforward mathematical induction shows that (15) holds 


for all positive integers p. 17:2 


260 Miscellanea 


The approximations (13), (8), (9), (10) and (11) can be used to obtain upper bounds to the equi- 
co-ordinate percentage points of (1) for p>2. Table 1 compares the approximations with the exact 
values for the special cases p; = } (i +j); n = 5,00; p = 3,9 and P = 0-50, 0-75, 0-95, 0-99; all entries are 
rounded to the nearest two decimal places. For n = co columns (13) and (8) require a table of the 
univariate normal c.d.f.; columns (10), (11) and the exact values were obtained from unpublished tables 
of the National Bureau of Standards (1953). For n = 5, columns (13) and (8) require a table of the uni- 
variate Student c.d.f. with 5 degrees of freedom; columns (10) and (11) and the exact values were com- 
puted by numerical integration, i.e. by applying Simipson's rule and using the National Bureau of 
Standards tables (1953) to evaluate the integral in (4). T'he exact values for P — 0-95, 0-99 will also appear 

. in another paper by Dunnett where equi-co-ordinate percentage points for the case рү, = $ (+7) will be 
given for p = 3(1)9, P = 0-95, 0-99 and n = 5(1) 20, 24, 30, 40, 60, 120, со. 


Table 1. Comparison of exact equi-co-ordinate percentage points with 
approximations for selected values of n, p and P 


Approximations 


For each of the approximations (13), (8), (10) and (11) in Table 1 it is conjectured that further calcula- 
tion will establish the following properties for the difference D = D(n, p, P) between the approximation 
and its exact value: 

(i) D is increasing with р for each n and P, 
(ii) D is decreasing with n for each p and P, 

(iii) .D is decreasing with P for n = oo and each p, 

(iv) D is parabolic-shaped with P for n = 5 and each p. 

The values of n, p and P in Table 1 were selected to cover a wide range of practical interest. Since only 
a limited number of exact: values for finite n are known the inequalities considered in this paper, which 
have a fairly wide application, should prove to be useful. 


REFERENCES 


CnAMÉR, H. (1946). Mathematical Methods of Statistics. Princeton University Press. 

Юохметт, C. W. A multiple comparisons test for comparing several treatments with a control. 
Unpublished manuscript. 

DuxxzrT, С. W. & SOBEL, M. (1954). Biometrika, 41, 153. 

KIMBALL, A. W. (1951). Ann. Math. Statist. 22, 600. 

NATIONAL BUREAU ОЕ STANDARDS (1953). Personal communication on unpublished tables. 

PavLsoN, E. (1952). Ann. Math. Statist, 23, 239. 


Miscellanea 261 


Galton's rank-order test 


By J. І. HODGES, Ји. 
University of California, Berkeley 


1. One of the first uses of rank-order in statistics was that of Galton in the study of data referred to 
him by Charles Darwin (1876), A quantity was measured on each of n treated subjects and also on each 
of n control subjects, obtaining measurements Ty, Ty ...,z, ANd уу, Yar ---» Vn respectively. The 2n mea- 
surements were arranged in common increasing order of size, and Galton counted the number, say G, 
of times that an z of given rank exceeded the y of the same rank. In Galton's case, n was 15 and G was 13, 
which was regarded as evidence that the first sample came from a population stochastically larger than 
that from which the second sample came. In modern language, if G is sufficiently large, we reject the 
null hypothesis that the treatment is without effect, in favourof thealternative that the treatment tends 
to increase the measurements.* 


2. Galton was not able to attach a significance level to his observation, inasmuch as he did not know 
the distribution of G under the null hypothesis. However, that distribution has recently been discovered 
in another connexion and proved to be very simple and elegant, Consider the usual penny-tossing game 
played by Peter and Paul, in which Peter pays one unit to Paul if the penny lands ‘heads’ and Paul 
pays one unit to Peter if it lands ‘tails’. Suppose we are given that after 2n tosses, the contestants are 
even. Let F denote the number of times that Peter was in the lead (where conventionally we assert that, 
when the game is even, the player leads who led on the preceding toss). If we identify Peter's winning 
the kth toss with the event that the kth measurement in the Galton problem is an 2, it appears that 
Е = 20. 1 

The conditional distribution of F has been found by Chung & Feller (1949) to be uniform, under the 
hypothesis that Peter's n victories are randomly distributed among the 2n trials. But this is just the 


Therefore we may assert, under the null hypothesis, that P(G 50) = (n—g- 1)/n. For example, to 
Galton's observation we may attach the significance probability 1/5. 
3. The proof of Chung & Feller is by means of a double generating function, and it may be of interest. 


2n 
to have an enumerative proof of so simple a result. Consider the class of " possible arrangements 


of n 0's and n 1°, and for each such arrangement determine its score g by counting those I's which precede 
the 0 of the same ordinal number. For example, in the sequence 01110001 the black 1's are 
counted, since the second 1 precedes the second 0, the third 1 precedes the third 0, but the first and 
fourth 1’s are not counted as they follow the first and fourth 0's respectively. The sequence just given 
has a total score g — 2. For a sequence of length 2n, the possible values of д are 0, 1, ..., n. We shall 
denote the score of a sequence (а... an) by [41 --- dy]. 

‘THEOREM. The number of arrangements with score 2 is independent ofz. — 

We shall prove this by defining а mapping T', which has the following орно: 

(a) The domain of 7’, is the set of all sequences of length 2n whose score is positive. 

(b) The range of Ta, is the set of all sequences of length 2n whose score is less than n. 

(c) T, is а 1-1 function. 


(d) [Т„(ау...а)] = (8 ++ a] — 1. 4 
arrangements with score т оп to those with score x—1 


0's апа 1° (this corresponds in coin-tossing to the first toss at whi 
arrangement (dy, ..., а) let k be the smallest 
the sequence ‘breaks’ at 2k, and note that 
[a; ...азһ] = [а\... da] + [tuia ++ Fan] (1) 
м М is has been extensively reviewed by Fisher (1945, Ch. ш), who points out that 
Darvin did ра have two Эрен фы id n, but n matched pairs. It may be noted that the риш 
did not serve to reduce the variance; in fact, if we test the hypothesis that there is no pair effect we have 
Рала = 0-554. In any case, our present concern is with the problem of two samples of n. 


262 Miscellanea 


Note that Ё may equal n. As 2k is the first equilibrium point, there cannot have been an earlier change of 
lead, so [a, ...@,,] must be either 0 or k. Further, if [a, ...a4,] = Ё, we must have a, = 1, а, = 0 and 
[a, ...031.1] = Ё—1. 

We now define the functions T', inductively. Let Тү(10) = (01) and check properties (a)-(d) for this 
case. Suppose we have defined T', for m<n satisfactorily. For any (a, ...a4,) of positive score, let 
Tl, .». аы) be 

(1) (2... daa Tcal aiii das) if [ава] > 0, 


(ii) (0251, --- Can la, ... 511) if [031,41 Agn] = 0. 


We must check (a)-(d), of which (a) is obvious. Condition (d) holds for (i) by the induction hypothesis 
and (1). As for (ii), if (а, ... 434] = 0, we must have 


0<[а;...а] = k = [а,...а,„), [00,,,5...0541] = 0, (9...01] = &— 1 = [Т„(а,...а„„)]. 


From (d) it appears that the range of T', is contained in the set of sequences with score less than n; any 
such sequence has a T',-inverse under (i) if (43,4) ... 454] <n — k, since T', , satisfies the conditions, 
and under (ii) otherwise, since when [4,41 ... aga] = n — k we have 


Е>[ау...а]=0, а = 0, aj = l, (Gg... Gg] = 0, (9...4) = T, (1254, -++ „бау... йд). 
The ranges of (i) and (ii) are disjoint and each is invertible, so (c) holds. 


4. While the Galton test as defined above applies only when the two samples are of equal size, there 
is a natural extension to the more general case. For example, let nj = 3, n, = 11. The third, sixth and 
ninth ranking observations in the larger sample divide it into four segments, each containing two 
observations. If we regard these three observations as ‘representing’ the larger sample, we may calculate 
G as before between the two samples of three each, and it is clear that G thus defined has the uniform 
distribution, since to each arrangement for it there corresponds the same number (27) of arrangements 
of the original problem. In general, if пу = n, + (п, +1), we may represent the second sample by its 
observations of rank k+1, 2k+2, ...,n,k+n,, and obtain a б which is uniformly distributed over the 
values 0, 1, ..., n4. , 

Tf n, —n, does not happen to be divisible by n, + 1, the above method will not apply exactly, but we 
may still obtain a uniformly distributed test statistic by randomization. The representative values of 
the larger sample may be chosen so that the segments into which they partition it differ by at most one 
in size. If we select one among all such partitionings at random, the resulting G may be shown to be 
uniformly distributed. Because of the natural repugnance of randomized decisions in such problems, 
it is probably preferable instead to associate with each arrangement the distribution of @ values it has 
among the partitions, and choose for its definitive G the integer nearest the expectation of this 
distribution. 

An interesting result follows if we let п, > со. The population from which the second sample was 
chosen in then known, and we are dealing with the one sample rather than the two sample problem. Our 
statistic is now definable as the number of the quantities y; — F-(n, + 1 — k/n, + 1) which are positive. 
That its distribution is still uniform may be seen by a limiting argument from the above, or by con- 
sidering it as a set of conditioned partial sums. 


REFERENCES 


CHUNG, Kar LAI & FELLER, W. (1949). On fluctuations in coin-tossing. Proc. Nat. Acad. Sci., Wash., 
35, 605-8. 
DARWIN, CHARLES (1876). The Effects of Cross- and Self fertilization in the Vegetable Kingdom. London: 
John Murray. 
_ FISHER, R. A. (1945). The Design of Experiments, 4th ed. Edinburgh: Oliver and Boyd. 


Miscellanea 263 


On bounds for the normal integral* 


By JOHN Т. CHU 
Institute of Statistics, University of North Carolina 


l. Let e= Кы (220). u) 
o 


G. Pólya (1949) and J. D. Williams (1946) proved independently that 
vaal- ea, (2) 


Two simple questions follow naturally: (i) Is it possible to replace the constant 2/m in (2) by а smaller 
quantity without breaking the inequality? (ii) Does there exist a lower bound, in a similar form, for w? 
We find the following answer. 

1f for all z > 0, the integral v given by (1) satisfies 


y1—e797) «e 1-67), (3) 


thea it is necessary and sufficient that 0 < a < } and b» 2/7. 
The proof of this statement is simple. First, 


lim 02/01 e”) = (27b). 
z-90 


LM 
Hence if (3) is true, b> 2/т. On the other hand (3) implies z*/[ — log (1 — 4»5)] < 1/a for all real æ. Since 
the limit of this ratio, as z -> со, is 2, we have a< }. Finally 


42 = E j- (27) ска |” fenm e-t” rdrdð. 
-zd -2 о 0 


Therefore о> (1-e). (4) 
Pólya showed that as z varies from 0 to co, the ratio of the L.H.S. (left-hand side) of (2) to the RLS 
decreases steadily from 1 to a minimum value and then increases steadily. Williams's calculations 


that of Pélya, it can be shown that the ratio of the нз. of (4) to the Rats. is a steadily decreasing 


el — 1) -z ei fe dt, 


e gini (5) 


r 
which is non-positive since eie Ге => E йай: 


As a consequence, we obtain that this ratio (of the 1.н.в. of (4) to the в.н.з) has an upper bound 2/7. 


2. A different lower bound for v can be obtained easily from а result proved by Chu & Hotelling (1954). 


There we showed that for а] 220, 21 402)/(403) < ат. (6) 
Hence it follows that ө>}{2г*](л + 223]. (7) 
For easy reference, we will give here a proof of (6). Let 

gol) = x2(1—4v*)/(40*), (8) 
then lim дох) = фт. We will show that go(z) is decreasing. Let а prime denote differentiation with 
respect to æ. Then gia) = xg.» (9) 
where g(a) = 01-40) — av^, j (10) 


* Work sponsored by the Office of Naval Research under Contract NR 042 031 at Chapel Hill. 


equivalent to 


264 Miscellanea 


gx) = дух), | 
where Palt) = z*— 120%, | 
glx) = (12/л)е- g(x), 
where gx) = (rfe? e "vna. 
L] 
From (5), we Һа MEE — 
MM теле — [ei ian] c 


It can be shown, by a similar argument used by Pólya for a similar purpose, that 
galz) = хаух + a, +4427 + ...), ( 


where a, <0 and a,>0, i= 1,2,.... Hence there exists an „> 0 such that g(x) «0 if 0&z«z, 

gs(x) > 0 if ж > z,. So as х increases from 0 to со, g,(z) decreases steadily from 0 to a minimum and 

increases steadily to оо. Consequently g,(z) first decreases steadily and then increases steadily. 

limg,(z) = lim g,(z) = 0, it becomes clear that g,(r) <0 for all 220. Therefore g,(z) is a 
dq 


7+0 

function of z. Hence we have (6). 
can be made easily of the two lower bounds for v given by (4) and (7). For simplicity, 

will be denoted by a(x) and b(z) respectively. Now а(х) 2 b(z) according as o(x) = ei? — 22*/7— 1 2 0, 

As æ varies from 0 to co, c'(z), the derivative of c(z), changes sign from negative to positive. So does of 

If 2 = х, is the solution of c(z) = 0, then z, = 1 approximately (the exact value is slightly 

Therefore, the lower bound in (7) is closer to v than that in (4) if 0<z< 1 (approximately) and less 

ifz21. 

Further, the following statement is of similar nature to the one made in $ 1. 


Tf for all x>0, v2 Maz*/(1 + az*)]À, (13). 


then it is necessary and sufficient that 0 < a < 2/7. On the other hand, for no finite а сап the в.н.з. of (13) 
be, for all x>0, an upper bound for v. 

The above statement can be shown easily by considering the limit, as 2 — 0, of the ratio of v to the 
R.H.S. of (13); and the limit, as 2: — co, of (1 — 4w?) (1 +ax?). 


3. Several authors have derived inequalities for Mills's ratio. Their results can be written in the form. 
of bounds for the normal integral. For example, in our notation, Gordon's (1941) inequalities are 


E d А 1 т 
——=—(2л)-%е-%=* ра ae AES еі" 
3 ; (27) е <<; 24107? e for x>0. (14) 


Birnbaum (1942) improved Gordon's upper bound in (14) and obtained 


1 (4424-2 


MI 2 


(27)3e-i* for х>0. (15) 


More recently, Tate (1953) showed what amounts to 


dui ш Лны: l-e-*) f 0 16 
. (+55) 5273 "eK e-*) for 220. (16) 
We will now compare briefly (2) and (4) with (14), (15) and (16). The upper bound in (16) is obviously 
not so good as that in (2). The lower bound in (16) is non-negative for all real x. It is 25 the в.н.з. of (4) 
according as h(x) = 2*— (8/7) (1 — e717") 2 0, as will be seen by squaring the difference twice. As z varies 
from 0 to со, h(x) decreases steadily from 0 to a minimum and then increases steadily to со; and vanishes 
at ж = 1:01 approximately (the exact value is slightly smaller). Therefore the lower bound in (16) is _ 
closer to v than that in (4) if and only if x> 1-01 approximately. 
The lower bound in (14) is an increasing function of x for all > 0. It is non-negative when a> 0۰65 
(approximately); and in this case it is S the н.н.в. of (4) according as g(a) = 2(2/7)! ж — x? — (2/7) ei 50. 
As ж varies from 0 to co, g(x) increases steadily from — 2/7 to a maximum and then decreases steadily 
to — со. The two roots of g(x) = 0 are approximately x = 0-5 and x = 1:45. Hence the lower bound in 
(14) is closer to v than that of (4) if and only if x > 1-45 approximately. 


Miscellanea 265 


Finally we point out that, for values of ж close to 0, the upper bound in (2) is better than thoss im (14) 
and (15); while for large values of z, the latter two are better. No detailed comparison is 


The author wishes to thank the referee for calling his attention to R. F. Tate's work and suggesting 
adding to the original note some comparison of the new and known results. Thanks are also due to 
Professor Harold Hotelling for his critica! reading of the manuscript. 


Віххвлсм, Z. W. (1942). Ann. Math. Statist. 13, 245-6, 

Сис. J. Т. & Ноткллхо, H. (1954). The moments of the sample median. To be published, 

Gonvow, R. D. (1941). Ann. Math. Statist, 12, 364-6, 

Pórva, G. (1949). Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, 
рр. 63-78. Berkeley: University of California Pross. 

Tare, К. Е. (1953). Ann. Math, Statist. 24, 132-4. 

WirLiAMS, J. D. (1946). Ann. Math, Statist. 17, 363-5, 


Substitutes for x’ 
By J. В. 8. HALDANE 
Department of Biometry, University College, London 


Neyman (1930) and Jeffreys (1948, p. 170) have suggested а substitute for x* involving some saving of 
computation. I here suggest what I believe to be а better one. If a sample consists of N individuals 
belonging to т classes, and n, belong to the rth class, the expected number on some hypothesis being 


Na,, where Ў а, = 1, then 5 (n, — Na,)* Р 
r=1 v= к=! fy . Е 
m (п. = №, 
Neyman’s х = E 223 
I consider = E AA a} 


: t : : , of x? 
Since there is a finite bability that any n, should be zero, it is clear that (he expectation 

is formally infinite. T shall show that it till exceeds m— I even when samples in which any n = 0 are 
excluded. Haldane (1953) gave reasons for preferring n, + 1 as a divisor in a similar context. can 
shown that 


(peer = т-1+М-Ч(ф-е+2)®а®— (3-с) m+1]+ ON). 
r n +e 


to avoid a multiple of Zag, which may 


Hence to avoid an infinite expectation c must be positive, and buo a simple formula, though 2 


be large, in the expectation, we must have b = c—2. The value 
b = 1 gives an expectation nearer to &(x*) when N is large. 
Let n, = Na,+2,. Then Xî = NO Xa, 


т 
„ү 
zN-Xsapl1It-— 
х= хае) 
2t X NE (-2) taz] 
i=? ғ 
+2) 
— M 
© Е 
= аъ Ў [NH Eau 72, - 2) a7. 
i=l r 


266 Miscellanea 
To find the expectations of these quantities we require the expectations of powers of z,, namely, 
Elx) = 0, Ela?) = Na,(1—a,), &(x3) =Na,(1—a,) (1—2a,), E (a$) = 3N7a3(1 —a,)* + O(N). 


If we write &*(zx‘) to mean the expected value of zi when n, is not zero, we omit the cases where x, = — Na, 
which have a probability (1 — a,)¥, which tends to zero quicker than any negative power of N. Thus 


. Na,(1—a,)N Ё Na,(1—a,) 1 — N?aX1—a,)N 
Cesena aoa Et 
So éx) = N+ Ў (1-4) =њ-1, 
=1 
E(x?) = ©, х 
Ф) = т-1+0-02 Zaz1— 3m 4 1) 00-2), a) 


E(x) = (m-1) ( -x) +0(N-*). 


Thus even if we exclude the samples where апу n, is zero, x’? has a positive bias often exceeding twice 
the reciprocal of the smallest expectation. The bias of у"? is smaller, and readily calculated. The higher 
moments of the distribution of y”? and of y’*, provided samples where any n, = 0 are excluded, differ 
from those of x? by quantities of the order N-!. Errors of this order are neglected in the ordinary use 
of y?, and can be neglected in that of д", since x? would be used if great precision were required. 

As a numerical example, suppose that the numbers expected in four classes are 63, 21, 21 and 7, those 
observed being 71, 13, 16 and 12. Then y? = 8-825, x’? = 9-470, X"? = 8:319. If we reverse the signs of the 
deviations, so that the observed numbers are 55, 29, 26 and 2, we find y? = 8:825, 4? = 16-832, 
x’? = 10-330. The addition of the bias 0-0268 to д”? gives values of 8:345 and 10-357, and this correction 
is clearly negligible. It is clear that y”? is a far better approximation than x2, and as it is no harder to 
calculate, it should be preferred. 


REFERENCES 


HALDANE, J. B. S. (1953). A class of efficient estimates of a parameter. Bull. Int. Statist. Inst. 33, 
231-48. 

JEFFREYS, Н. (1948). Theory of Probability. Oxford University Press. 

Neyman, J. (1930). Bull. Int. Statist. Inst. 24, 44-86. 


A problem in the significance of small numbers 


By J. В. 8. HALDANE 
Department of Biometry, University College, London 


Dr H. Griineberg and Dr A. G. Searle have each presented me with the following problem. N mice 
were examined anatomically. Before examination they had been classified into m groups, containing 
Tay Nyy Nyy <<, Nps ..., Nm members. The members of a group were in faet grouped together on the basis 
of common ancestry. They could have been grouped on the basis of phenotypic resemblance. In one and 
only one of these groups, containing n members, a mice were found with a specific anatomical abnor- 
mality. What is the probability that such a coincidence should occur as the result of random sampling? 

This question, like all such questions, is rather vaguely stated; and the answer to it may depend on 
the theory of probability adopted by the answerer. The answer here given is therefore not the only 
possible one. 

We can arrange a fourfold table as follows: 


Normal Abnormal 
One grou n—a a 
Other ips N-n 0 


Miscellanea 267 
The application of Fisher's (1935) ‘exact’ method gives 


_ (N-a)int 
~ Nt(n-a)! 


as the probability that all the abnormals should belong to this particular group. If they had all belonged 
to any one group this would have attracted notice provided P, was fairly small. If, however, one group 
had consisted of about half the total, that is to say n = {У approximately, and а had not exceeded 3, 
the question of significance would not have arisen. The question can therefore be put somewhat more 
concretely: ‘What is the probability that if there are а or more abnormals, all should occur in a group 
with n or less members?’ Or still more definitely: ‘If there were just a abnormals, and each group соп- 
sisted of n members, what is the probability that by an accident of sampling, all the abnormals should 
be found in one group?’ 
Тһе probability that all the abnormals should be found in one group is 


P, 


pii (N-a)! * т! 
3 GNE raya) 
If each group consists of just п members, which implies that N /n is an integer, this probability їз 


u^ N N(N-a!w! (N-a)(n—D! 
Р, = Рух = Nin—a)! (N—1)i(n—a)!" 


І suggest that this is a reasonable estimate of the probability of the observed event, or of one equally 
or more unlikely, even if the values of n, are unequal. We see that if a = 1, P, = 1, which is reasonable, 
since it is certain that the one abnormal will fall into one of the groups. We can then frame the question 
as follows: ‘The first abnormal individual was found in a certain group. What is the probability that the 
a — 1 abnormals found among the other N — 1 mice should also be members of this group?” Clearly the 
value found, i.e. Р, is a reasonable answer to this question. 

In the case propounded to me by Dr Searle, N = 472, n = 23, а = 2, т = 4. So 

4701221 22 — 


-———— = — = 0-047. 
з= 471121! 471 


The uncorrected value of P is P, = 0-0023. At least one of the other groups must have consisted of 
150 or more mice. Had the two abnormals belonged to this group no question of significance would 
have arisen. The estimation of P as about 0-05 rather than about 0-002 allows for the fact that, if the 
coincidence in question is an effect of random sampling, a number of other comparable coincidences 


would not be considered significant. ; 
Any test of significance is somewhat arbitrary. For example, in place of the x* test where 


п, [а„—@(а„)]® 
aa Ха) 


where a, is an observed number and 6(a,) its expectation, we could use 


$l а—&(а)|, 
ror [а )}% 
imi isti i i i imple distribution. On the same data 
or many other similar statistics. x? is used because it has a fairly simp! 1 
the E might give a higher or à lower value of P. The test here suggested is easy to apply, and 
"I ink, кебу blem merits a fuller discussion, and that a solution based on a different 


I think, however, that the pro! 
approach might be of equal or greater value. 


REFERENCE 
Frsumr, В. A. (1935). The logic of inductive inference. J. R. Statist. Soc. 98, 39-54. 


Е 


268 Miscellanea 


Bounds for the ratio of range to standard deviation 


By GEORGE W. THOMSON 
Ethyl Corporation, Detroit 20, Michigan 


1. This note supplements the work of David, Hartley & Pearson (1954) on the distribution of the ratio 
of the range w to the standard deviation estimate s, where both are from the same sample of size n. 
Bounds are shown to exist for w/s for all populations with non-zero variance and percentage points are 
given for samples of three from a normal population. 


2. The bounded nature of the distribution of w/s has not been noted by any of the authors who have 
investigated this statistic*. It can be readily shown that the upper and lower bounds for w/s for samples 
from any population with non-zero variance arise from certain simple configurations of sample points. 
The upper bound, which corresponds to minimum s for a given range w, results from the arrangement 
with n — 2 of the points at the sample mean and the other two points at equal distances from the mean. 
The lower bound, which corresponds to maximum s for a given w, results from the concentration of half 


Table 1. Bounds of the distribution of the ratio of range to standard, deviation, 
w/s, in samples of size n from a normal. population 


Percentage points Percentage points Percentage points 


of the sample points at one extreme and the other half (plus one, if the sample size is odd) of the sample 
points at the other extreme. The numerical values of the bounds can be shown to be: 
Upper bound of w/s: — J(2(n — 1). 
Lower bound of w/s: 24((n—1)/n) for n even, 
24(n/(n-4-1)) for n odd. 
As n becomes larger the lower bound approaches 2. It is also evident that these bounds are distribution- 


free provided that the points can be distributed at all. Table 1 shows the numerical values which corre- 
spond to the lower and upper 0 % points in Table 6 of the paper by David et al. 


* [Editorial Note. The existence of these limits has no doubt been noticed by others who have 
considered the distribution of this ratio; in the correspondence leading to the joint paper by David, 
Hartley & Pearson (1954), the first author gave these limits in a letter of February 1954, but they 
were omitted in the published paper. E.S.P.] 


| 


Miscellanea 269 
3. Certain properties of the distribution of samples of size 3 from a normal population have been 
obtained by Lieblein (1952), as noted by Seth (1950). It is easily shown from Lieblein's results that the 
percentage points of w/s for n = 3 are given by 
2 сов [30° (1 — F)], 
where Fis the cumulative frequency. Thus, for the upper 10 % point, F = 0-90, w/s = 2cos 3° = 1-99726. 
Table 2 shows the upper and lower 0-0, 0-5, 1-0, 2-5, 5:0 and 10-0 % points and the median. The upper 
percentage points agree to the third decimal place with those given by David et al. (1954, Table 6). 


Table 2. Percentage points of the distribution of the ratio of range to standard 
deviation, w/s, in samples of size 3 from a normal population 


Lower percentage Upper percentage 
points w/s points w/s 
0-0 1۰73205 10-0 1-99726 
0-5 1:73466 5-0 1-99931 
10 1۰73726 25 1-99983 
2:5 1:74499 1:0 1۰99997 
5-0 1-75763 0-5 1-99999 
10-0 1-78201 0-0 2-00000 
Median 
50-0 193185  — E] 


4. The bounds for w/s have been used in our research laboratory for the past twelve years in routine 
checks of the computation of s for small samples, since gross errors ean be detected at once. 
» ^ 


REFERENCES 


Davin, Н. A., НАвтшкү, Н. О. & Pearson, E. S. (1954). Biometrika, 41, 482-93. 
LIEBLEIN, J. (1952). J. Res. Nat. Bur. Stand. 48, 255-68. 
Sern, С. R. (1950). Ann. Math. Statist. 21, 298-301. 


On the estimation of population parameters from marked members 


Bv J. A. GULLAND 
Fisheries Laboratory, Lowestoft 


exist two distinct types of mortality, that due directly к capture 
implification it i that these ma; 

by man, and that due to other causes. For simplification it is not unreasonable to assume ә у. 

р represented by the constant exponential coefficients, denoted by F and М. It is the separation and 


timati hich forms a major part of any study of such populations. d : 
green is of nares of millions, and recaptures from a marking experiment 


t by any fisherman and recaptures may therefore be considered as 


: iscrete intervals, as in most experimental trapping (e.g. Hammers- 
rathér than at disc released, then direct maximum likelihood 


In commercial fish populations there 


in interval (i, t+ df) is Ре +, Suppose now N in- 


ivi i at time; t | „апа that these are the only recaptures 
dividuals are released, of whic! n az тойо, ded. Then the probability of being recaptured is 


F 
ми 1 — FM), 
fire tdt Fem! е 


270 Miscellanea 


and the probability of not being recaptured is 
M F 
F+M*F+M 
The likelihood function may therefore be written 


eT, 


a Гү (r eraon (MF мт)" 
= (де : (е ) : 


Hence L= tog (*) +nlog F — (F + M)Xt;-(N —n) (log (M + F e¥+497) — log (F + M)}, 


OL п (1—FT)e-«FMT 1 
aF 7 Б-и) M+F e FHT - 
eL 1— FT e4F+30T 1 

and aM = B+ (N= (Fp FF riu : 


If the experiment is continued so long that e-**307 may be neglected, then these equations become 
very simple, namely, 


9L п N-n ôL " (N-n)F 
ФЕР “Few OMT FF FFM’ 
Putting these equal to zero gives the solution fa 
ES | (N-n)n 
fux Й = Nit; ` 


These estimates are biased, in fact having infinite expectations and higher moments. However, Et;/n 
is an unbiased estimate of (F + M)7, when e~?+40? is neglected, and in practice, for reasonably large 
values of n and N the distribution of Ê and M for repeated experiments is likely to be quite reasonable. 


REFERENCE 
HAMMERSLEY, J. M. (1953). Biometrika, 40, 265-78. 


REVIEWS 


| [2721] 
| Biomathematics. By Скрис A. B. 8мїтн. London: Charles Griffin and Co. Ltd. 
| Рр. ху+ 712. 80s. net. 
Biomathematics, by the late Dr W. М. Feldman, appeared in 1923 and reached a second edition in 
1935. The present book is modestly described by Dr Smith as a third edition. But it has been entirely 
rewritten and is, in effect, a new book. 

Although advanced statistical theory uses nearly all branches of mathematics, except perhaps pro- 
jective and differential geometry, it draws on them to very varying extents. The student of statistics 
can go a long way with only a nodding acquaintance with analytic geometry and differential equations, 

1 but finds a knowledge of calculus and advanced algebra almost essential from the outset. He therefore 
requires a special course of mathematics for statisticians and will not find it within the covers of any 
single book. There is a great and growing need for a work of this kind. 

Dr Smith's book does for biologists what one would like to see done for statisticians in general. It 
provides a thorough grounding in the ideas and techniques necessary for those who want to be mathe- 
matical biologists without going so far as to be biological mathematicians; and it does so with great 
fluency and insight without any sacrifice of rigour. There can be few writers as well qualified as Dr Smith 
to write such a work. His mathematical powers, his extensive knowledge and practical experience of 
mathematical and statistical applications in biology, the care he has bestowed on the work and his 
gifts as a teacher combine to make this a work of outstanding excellence. It will be useful not only to 
biologists but to any student who requires a competence in mathematics for the purposes of pursuing 
his own subject. 

The book opens with two chapters on arithmetic, including some calculating devices and some 
account of mechanical equipment and punched card machines. Chapter 3 is a refresher course on 
algebra, leading to the treatment of inequalities in Chapter 4. Chapter 5 deals with the connexion 
between algebra and geometry and provides a foundation in analytic geometry and conie sections. 
The next two chapters deal with logarithms (which Dr Smith introduces by means of the equi-angular 
spiral) and their applications in computation by slide-rules and nomograms. Chapters 8-12 are on 
differential and integral calculus and proceed as far as simple differential equations and partial differ- 
entials, Chapter 13 deals with series and Chapter 14 with vectors. Methods of solving equations and 
matrices receive separate chapters, and the book closes with three chapters on Chance, Statistical 
Distributions and Simple Statistical Procedures, together with one on Colson's method of simplifying 
arithmetical caleulations. The book contains over 700 pages and the amount of ground covered is 
astonishing considering that no impression is given of hurry or over-consideration. 1 

Readers of Stephen Leacock will remember his protests about the dullness of arithmetical examples, 
and his attempt to dispel it by investing with human attributes those three anonymous characters of 
our youth А, В and O; especially the case of poor О, who died of pneumonia contracted while trying 
to fill a cistern with a leak in it. Dr Smith, who combines with his other qualities a sense of humour, 
would have endeared himself greatly to Leacock. For instance, the additive properties of matrices 
are illustrated on the population of Narkover, the determination of minima is exemplified, not on 
those eternal salmon tins, but on the angles of origin of human arteries; the treatment of physical 
magnitudes is enlivened by an example leading to the conclusion that if one falls into a cold sea the 
best thing to do is to swim hard so as to keep warm. ] н nu 

Altogether, this is a very sound, readable and sensible book. In spite of its price it should become 
widely popular. M. G. KENDALL 


è Introduction to Mathematical Statistics. 2nd edition. By PAUL G. Horr. New York: 
Y. John Wiley and Sons Inc.; London: Chapman and Hall. 1954. Рр. ix +331. 40s. 


This is a much expanded and revised version of a book that first ove in 1947. Apart from le 
expansion of many topics such as regression, the analysis of variance and non-parametrie methods, 
uis have Dedi sd beris additions in the probability field. As a result the book is much more self- 
contained than formerly. It presupposes à knowledge of calculus—probably about two years study— 
and seems designed for scientists who have some mathematical background and wish to master the 


272, f Reviews 


elements of statistical theory. For the applications to their own fields, they will have to turn to other 
texts. 

Inevitably in a book of this size some things have had to be omitted, for example, time series, 
equalization of variance and probit analysis. It is perhaps surprising, in view of the intended publie, 
that no mention is made of Sheppard's corrections for grouping or the many short-cut procedures 
available nowadays, such as the range in the analysis of variance. There is also an unusual differentia- 
tion between large sample and small sample distributions—is the F distribution a small sample 
distribution? Diagrams are used excellently throughout the book and are well labelled. There are 
numerous exercises for which no answers are provided. The book ends with a number of useful tables, 
but whilst the percentage points of t, F and ү? are given, those for the normal curve have to be obtained 
by inverse interpolation in the table of the cumulative distribution provided. P. G. MOORE 


Design and Analysis of Industrial Experiments. Edited by О.І. Davies. London and 
Edinburgh: Oliver and Boyd, for Imperial Chemical Industries Limited. 1954. 
Pp. xiii + 636. 635. 


This book is written by a team of authors largely the same as those who wrote Statistical Methods in 
Research and Production in 1947, but, as explained in the Introduction, the present volume deals with 
the design of experiments and their subsequent analysis rather than the extraction of information 
from previously existing data. The particular point of the book is that it is written from a broadly 
chemical point of view, rather than the more usual agricultural one, and so the design of experiments 
is freed from the restrictions imposed by agricultural éonditions, and sometimes inadvertently carried 
over into fields where they do not apply. z Р 

Successive chapters deal with Simple Comparisons, Sequential Tests—a remarkably fine chapter 
—Sampling and Testing Methods, Randomized Blocks, and Incomplete Randomized Blocks, again 
very good because it concentrates attention on the simpler designs required for industrial experi- 
mentation. It is nice to have the assurance that these are worth while, for the alternative view that 
in them too few observations have to bear a heavy load of theoretical interpretation is somewhat 
prevalent. Then there are four chapters on Factorial Experiments and one on the determination of 
optimum conditions. In this last chapter the treatment is largely orographical, which I like as it 
enables simple words like ‘ridge’ to be used as short-cuts avoiding much difficult mathematical 
explanation. Then follows a glossary which is far the worst part of the book. No readers of a book of 
this type need to have explained what is quaintly called ‘The arithmetic average or mean’, and if 
they do the definition here given will hardly help them, nor is it possible to explain ‘Universe, Popula- 
tion, Parameter, Sample, Statistic’ in fourteen lines. The final tables are well arranged. 

The book as a whole contains a very large amount of valuable information, but as might be expected 
from a team the style varies greatly in clearness and readability ; certainly the best parts are very good. 
There is an implied assumption that nothing but the experiment and its own conditions need be 
considered in planning, and this is certainly not always true for those who work in less magnificent 
concerns than IL.C.L, but still you must know how the experiment really ought to be done, before 
planning a compromise with what can be done, and the necessary information is here. 

Finally, it must be emphasized that in industry an investigation is not complete when the results 
have been obtained in conventional statistical form. It is necessary to translate them into the language 
used by the executive who is to decide what action to take on them. It is avowedly not the purpose 
of this book to deal with this question, but there is internal evidence that at least one of the authors 
would be well able to do so, and I should like to suggest that in a future edition such a chapter should 
be added in place of the glossary. i L. MCMULLEN 


Sample Survey Methods and Theory. By M. Н. Hansen, W. N. Hurwitz and W. С. 
Mapow. New York: John Wiley and Sons Inc.; London: Chapman and Hall, 1953. 
Vol. т, Methods and Applications. Pp. xxii + 638. 64s. Vol. п, Theory. Рр. xiii + 
332. 56s. 


These books aim to cover the whole field of the sample survey and cater for all tastes and all classes of 
statistician. Vol. т sets out the methods in common use in sampling surveys and the way in which the 
methods are applied. It begins with commonsense talk about the fundamental aims of sampling and 
sampling design and continues with the delineation of such statistical ideas as may be expected to be 


+e 


- Reviews 273 
useful to the particular purpose of the authors. The chapter on bias and non-sampling errors in survey 
results is obviously written from practical experience. Much of the book is concerned with descriptions 
of the various types of sampling—simple random sampling, stratified simple random sampling, simple 
one or two stage cluster sampling, stratified single or multi-stage cluster sampling. In each case the 
procedure is clear, every possible query which might be raised by the would-be user is answered. The 
chapter on estimating variances and the accuracy of the method of estimation will not be helpful 
unless the reader has learned a little elementary statistics but presumably if he has reached this point 
in the book he will willy-nilly have acquired such basic information as he will here find to be necessary. 

Vol. п сап either be read in conjunction with Vol. 1 or separately when it can be looked on as an 
elementary text-book exercise in statistical algebra. The standard here is uneven. It is surely un- 
necessary to explain ‘why we study summation notation’ (p. 11) to someone who is expected to under- 
stand without explanation why we study convergence in probability (p. 72). None the less in spite of 
irrelevancies the algebraic framework is clear and concise and should not cause undue difficulty to 
any student interested enough to read the books Certain contractions are possibly invented by the 
writers and should be rejected; ‘plim’ standing for ‘limit in the sense of probability’ is one of the least 
useful of these. All the formulae used in vol. 1 are derived here, and the way is clear for the reader to 
make such modifications as his own particular sampling design may dictate. 

The two books taken together cover fully the whole field of sample surveys, and are unlikely to be 
superseded for some years to come. This being the case it is unfortunate that such encyclopaedic works 
should be priced out: of the possible range for many persons. F. N. DAVID 


| Sampling Theory of Surveys with Applications. By PANDURANG V. SUKHATME. 
` New Delhi: The Indian Society of Agricultural Statistics; Ames, Iowa: The Iowa 
State College Press. 1954. Pp. xxviii+491. $6. 


This is the fifth book dealing specifically with sample survey methods and theory to appear in the last 
five years. The author is head of the statistics branch of F.A.0. and has wide experience of sample 
surveys in India and elsewhere, and one might have expected his book to be more closely concerned 
than it is with the methods of overcoming the difficulties arising in agricultural surveys and in surveys 
of underdeveloped territories. In fact, Dr Sukhatme has written a text-book on the algebraical develop- 
ment of the standard branches of sample-survey theory. On this basis the treatment is painstaking 
and clear and the book will be most valuable to the student or research worker who requires an ad hoc 

l treatment of the theory appropriate to any particular sampling method. " 
The up-to-date GANE. of the book is shown by the prominence given to unequal-probability 
sampling, the treatment of which is based in part on the author's own research, and by. the chapter on 

| non-sampling errors, which contains some novel results on the treatment of interviewing errors. 

However, it is to be regretted that Dr Sukhatme in common with other recent writers on sample 
survey theory has neglected the opportunity to base his treatment of the theory on the simplifying 
principles set out by Yates five years ago in Sampling Method for Censuses and Surveys. Writing mainly 
for practical workers rather than for theoreticians—no proofs of the formulae were given—Yates 
based his exposition on the fact that most survey designs are built up of the following components: 


(a) Method of selection, e.g. simple random, systematic, probability proportional to size. 
- (b) Stratification. | 

(с) Use of supplementary information. 

(d) Multi-stage sampling. 

(e) Multi-phase sampling. Р 

The number of combinations of the various possibilities under each of these heads is so large that 
it would be impossible to give wit in reasonable bounds the appropriate estimation and variance 
formulae for all sample designs likely to be useful in practice. However, Yates pointed out that the 


contribution from each component could be considered separately and the results combined to give 


tj for any partieular design. 
Wires p trairi of a mean based on a stratified sample normally takes the form 


у= ING z 7, is the estimate of the ith stratum mean. Suppose that by referring to the 
ааыа оа sampling under (a) above, we were able to calculate an estimate sj of Vg). 
Yates would immediately write down the estimate of V(y) as ENîsî/ (EN, and would assert that this 
result held generally whatever methods are used for sampling within strata. Sukhatme, however, has 
preferred to start from the beginning for each type of stratified sample and to develop explicit formulae 
,Biom. 42 
- - 
Li К 2 


274 Reviews 


from first principles for each case, rather than prove a general rule for all types of stratified sample, 
Similarly, in his treatment of both the ratio method and multi-stage sampling, the author has pre- 
ferred to treat each case separately from first principles rather than to develop the general rules by 
which the estimates can be built up. This is all the more regrettable since the general laws are easier 
to prove than the formulae for special cases. 

A further opportunity for simplification was missed in the treatment of sampling with unequal 
probabilities with replacement. If we draw a sample of n with replacement from a population of 
values 34, Ya, -.., yy With selection probabilities Py, рз, ..., Py, all necessary estimation and variance 
formulae can be obtained by regarding the sample as a simple random sample from the discrete 
probability distribution in which the observation y; has associated with it the probability p,. Thus, 
once the theory of sampling from an infinite population with equal probabilities of selection has been 
worked out the corresponding results for unequal-probability sampling with replacement emerge as 
corollaries. In Dr Sukhatme's treatment, however, these results are worked out afresh. 

For these reasons the book contains a great deal of avoidable algebra. Although some students 
undoubtedly find it helpful to have an ad hoc theory set out for each particular type of sample design, 
the consequent details are likely to prove unnecessarily time-consuming for the reader whose object 


is to grasp the principles of sample-survey theory. 3 DURBIN. 


Life and other Contingencies, Vol. т. By P. Е. HOOKER and L. Н. LONGLEY-COOK. 
Cambridge University Press. 1953. Pp. viii + 312. 22s. 6d. 


This book is one of a series commissioned by the Institute of Actuaries and the Faculty of Actuaries 
to provide a course of reading suitable for the examinations conducted by these bodies. The present 
work replaces Spurgeon’s Life Contingencies, which now ends an honourable career of some forty 
years. The authors have evidently been influenced to a considerable extent by their distinguished 
predecessor. The main changes arise from the demands of the new actuarial syllabus and consist of 
the introduction of sections on extra risks, valuation methods and non-mortality (sickness, maternity, 
etc.) benefits. There are also a number of additions necessitated by developments in theory and prac- 
tice, such as the treatment of family income benefits and the appendix on International Actuarial 
Notation. 

However, the book is by no means merely a rather thorough revision of the earlier text-book. It 
is a completely new work and develops the subject in its own way. Emphasis is definitely on the pro- 
vision of methods for practical application in life-office work. There is little preoccupation with com- 
plicated algebraic manipulation, and the theoretical study of mortality laws and stationary populations 
is reduced to a minimum. Worked examples are collected at the ends of chapters and, perhaps for this 
reason, appear to be less profuse than in Spurgeon’s text-book. There are no exercises to be worked 
out by the student. ` 

The treatment of multiple decrement tables is left to vol. п, and it will be interesting to see how 
the authors tackle this still somewhat controversial topic. As a result of this division of subject-matter 
joint-life assurances and annuities are not considered in the present volume. 

It is to be hoped that ‘Hooker and Longley-Cook’ will be as long-lived as ‘Spurgeon’. If this is so 
the reviewer hopes to see in the successive revisions a much fuller treatment of non-mortality benefits, 
and, if possible, either an earlier and more adequate discussion of the construction of tables, or the 
omission of this subject by its transfer to another part of the actuarial syllabus. N. L. JOHNSON 


Population Statistics and their Compilation. By Huan H. WornrENDEN. The Uni- 
versity of Chicago Press, for the Society of Actuaries. 1954. Pp. xxii + 258. 568. 6d. 


This book is a completely revised edition of a work originally published as ‘Actuarial Study No. 3’ 
by the Actuarial Society of America in 1925. The choice of subject-matter clearly reflects the actuarial 
interests of the author. In the analytical section of the book there is much greater emphasis on the 
study of mortality than on any other aspect of population phenomena. No fewer than 70 pages are 
devoted to the discussion of various methods of constructing mortality-tables and life-tables, while 
a further 23 pages are concerned with the comparison of mortality by occupations, causes, ete. and 
a brief discussion on forecasting mortality rates. There are only eight pages on the use of data on 
marriages, births, orphanhood & unemployment and three pages on ‘Sickness Data’. Emigration and 
immigration are not specifically considered at all. The theory of reproductivity is relatively adequately 


"———————— 


Reviews 275 


treated in 16 pages containing some sensible warnings on the uncritical use of the various indices of 
reproductivity. 

The earlier part of the book shows much less evidence of this lack of balance, There is an interesting 
and useful description of the compilation of population statistics by means of censuses, and registra- 
tion of births, marriages and deaths. This is accompanied by a discussion of likely sources of un- 
reliability in the data, and methods of correcting the consequent defects. 

Comprehensiveness has evidently been the author's ideal. While he has sueceeded in making his 
book a mine of information on particular topics, it is to be expected that students may experience 
difficulty in sorting out really important material from items now mainly of historical interest. The 
dust-cover claims that ‘this book is the only presentation, by an actuary, of the particular actuarial 
viewpoints and methods necessary to the production of modern population statistics’. Apart from the 
fact that this statement ignores books published by the Institute of Actuaries in recent years, the 
catalogue-like assembly of methods presented makes difficult the disentanglement of the more modern 
methods. 

Despite these criticisms, the book should prove useful as a comprehensive, yet handy, work of 
reference. Thereis no index, though there is adetailed ‘Table of Contents’ by paragraphs (not by pages).» 
There are nearly one hundred footnotes which bear witness to the author’s devotion to the ideal of 
comprehensiveness. Most of them contain useful information, though they tend to interrupt the 
textual line of argument. 

There is a 22-page appendix on ‘Some Theory in the Sampling of Human Populations’, by 
W. E. Deming. It is well written but will be of little value except to those with sufficient statistical 
training and such persons will probably be familiar with the subject already. N. L. JOHNSON 


Table of Binomial Coefficients. Royal Society Mathematical Tables, 3. London: 
Cambridge University Press. 1954. Pp. viii+ 162. 35s. a 


These tables, prepared under the editorship of J. C. P. Miller, were computed mainly at the National 
Physical Laboratory, Liverpool University, and the Royal Aircraft Establishment at Farnborough. 


They give "C, for 
(i) r<4n<100, 


(ii) 2<r<12, 200<n< 500, 
(iii) 2<r <11, 500 € n < 1000, 
(iv) 2<r<6, 1000 <n < 2000, 
(v) 722,3, 2000 <n < 5000. 


The computations throughout are performed by use of the recurrence relation 
"HC, = "O, +" r-1* 


The main application of the table lies in the field of number theory for such investigations as to the 
possible uses of tetrahedral numbers to form other numbers. Forstatisticians the combinatorial numbers 
mainly conjure up visions of binomial probabilities and here, as the probabilities have to be combined 
with the combinatorial number, the volume is unlikely to supplant such well-worn friends as the in- 
complete beta function, at any rate for the lower values of n. Occasions do, however, arise where bi- 
nomial coefficients alone are required—certain probability problems of a combinatorial nature for 
example—and these tables will form a useful reference work for such cases. The volume is well set out 
with very clear type, and a useful index is provided to aid one in locating the value io ore E. 

“о. 


Е 


Bessel Functions and Formulae. Compiled by W. G. BIOKLEY. London: Cambridge 


University Press, for the Royal Society. 1953. Pp. 11. 3s. 6d. 


is a straight reprint of the Summary of Notations and the section on ‘Fune- 
2^ ha British poss Wa Mathematical gsm Vol. z ре + Pone тот " 
i i ial, i i tions obeyed by the an 
It gives a summary of the differential, integral and difference relat 1 е п 
а-а: functions and also some connecting formulae with related functions CEN Hankel s, Whittaker's, 
ete.). It further gives a variety of series expansions for and involving Bessel r ctions. Nori 
Of particular interest to statisticians are the sections giving generating 'funetions p 
transforms of Bessel functions. D. E. BARTON 


"This collection of formul 
tions and Formulae' from 


276 Reviews 


PUBLICATIONS OF U.S. DEPARTMENT OF COMMERCE, 
NATIONAL BUREAU OF STANDARDS 


(i) Tables of Chebyshev Polynomials S,(x) and C,(x). Applied Mathematics Series, 9, 
1952, Pp. xxix-- 161. $1.75. 


Defining the nth order polynomials as 
C,(z) = 2 соз {аге сов $æ}, S,(x)- 2(4 — x?) sin {(n + 1) are cos 42}, 


the main tables gives their values to 12 decimal places for m = 2 (1) 12, x = 0(0-001) 2. Subsidiary tables 
give the expansions of these polynomials in powers of z, together with those of the modified forms: 


Tala) = 4C,(2x), T(x) = 10,(4«—2), 
U,(z) = 8„(2х), U(x) = 5,(42— 9). 


Also given are inverse relations for powers of z as linear sums of T',(z) and T*(x) up to the twelfth power. 

These polynomials are chiefly of use as ancillaries to the computing of functions possessing power 
series expansions over a whole range of values of the argument, as the convergence of a series of Cheby- 
shev polynomials is more uniform and in general more rapid than that of a power series expansion taken 
about a point of the range. The computation was carried out under the direction of A. N. Lowan and a 
22-page introduction is written by Cornelius Lanczos. This gives, inter alia, the example of the asymp- 
totic expansion of the incomplete normal integral, where the Chebyshev series to six terms gives one-fifth 
the error of approximation of the power series expansion of the same order, for the possibility of deviating 
more than 4/2 standard deviations from the mean. These polynomials are distinct from those commonly 
used in statistical practice and also described as Chebyshev’s, defined by 


# fuu ide : (7 Е Jl. 


(ii) Tables of coefficients for the numerical calculation of Laplace transforms. 
Applied Mathematies Series, 30. 1953. Pp. 36. 25 cents. 


These tables give coefficients for the approximate evaluation by quadrature of the Laplace transform 


F(p) = [лө ет” 
0 


of a function f(t) which is given for n equally spaced values of t whose range includes the effective range 
of the argument. The coefficients are given to nine decimal places for n = 2(1) 11 and varying ranges and 
intervals of p depending on n (e.g. p = 0-1 (0-1) 1-0 for n = 2, p = 1(1) 10 for n = 11). Special tables are 
also given for the simpler ease where f(t) is a low order polynomial. A short introduction is provided by 
Н. E. Salzer. In this he remarks: ‘Convenient estimates of the error of approximation seem difficult to 
obtain.’ However, in the example given of f(t) = Jọ(t) it is found that for n = 11, F(p) differs from the 
approximating function by less than 0:14 % over the range of p considered. D. E. BARTON 


iii) Tables of Lagrangian Coefficients for sexagesimal interpolation. Applied 
Mathematics Series, 35. 1954. Pp. іх+ 157. $2.00. 


These tables give 3-, 4-, 5- and 6-point Lagrangian interpolation coefficients A, for arguments in веха- 
gesimal measure, suchyts angles given in units of degrees, minutes and seconds. The coefficients are given 
for 3600 values of the fraetion of the tabular interval. Thus if the function is tabled for each degree, the 
coefficients may be used to find a value of the function at any required minute and second. Each coeffi- за 
cient is tabled to eight decimal places. There is a brief Introduction by H. E. Salzer. 


Reviews 277 


(iv) Tables of circular and hyperbolic sines and cosines for radian arguments. 
Applied Mathematics Series, 36. 1953. Pp. x+407. $3.00. 


The main Table I gives for x = 0(0-0001) 1-9999 values to nine decimal places of the functions sin x, 
соз x, sinh z and cosh æ. There are three short supplementary tables: 


Table II gives values to nine decimals of the same four functions for x = 0:0(0-1) 10-0. 
Table III is a conversion table for expressing degrees, minutes and seconds in radians and vice versa. 
Table IV gives to 15 decimal places values of n x ўт for n = 1(1) 100. 


There is а brief Introduction by A. N. Lowan. 


(v) Table of secants and cosecants to nine significant figures at hundredths of 
a degree. Applied Mathematics Series, 40. 1954. Pp. vi+46. 35 cents. 


This tables gives secz and cosecz for ж = 0-00 (0-01) 90-00 degrees. It will serve as a companion to the 
table of sin x and cos # tofifteen decimal places at hundredths of a degree previously published as No. 5 
in the Applied Mathematics Series. Е. 8. PEARSON 


. 


CORRIGENDA 
Biometrika (1954), 41 


(1) M. Е. WISE, p. 328. For equation (8:5) read: 
Nz = (N 4n) 0-3) -4(p 1) 
(2) P. WHITTLE, p. 437. On the left-hand side of equation (16) 
for Er- ağı 05s read &— аб bE 4,1 


(3) Н. A. Davrp, p. 466. In Table 2, the result for the rectangular population 


with n = 4 
for 1:019 read 1010 


(4) D. R. Cox, p. 472. In Table 2, for population (b), the value for n = 4: 
for 1:961 read 1-939, and forn=5: for 2:252 read 2-196 


The Corrigenda to papers by Rushton and Ruben stated in the List of Contents for 
Vol. 41, Parts 3 and 4, as printed on p. 568, were wrongly placed in front of the first 
page (p. 287) of that issue. 

4 


> 


BIOMETRIKA PUBLICATIONS: BOOKS OF TABLES 


Issued by the Cambridge University Press, Bentley House, London, N.W -1 
and obtainable from any bookseller 


‘Tables of the Incomplete B-Function 
ЕрїтЁр By KARL PEARSON 
59 pages of Introduction and 494 pages of Tables 


Price: $55. net 


Tables of tlie Incomplete F-Function 
EpiTep BY KARL PEARSON 
31 pages of Introduction and 164 pages of Tables 
Price: 425. net 


Tables of the Complete and Incomplete Elliptic Integrals 


(from LEGENDRE'S Traité des Fonctions Elliptiques. With autographed portrait of LEGENDRE) 


39 pages of Introduction by KARL PEARSON and 94 pages of Tables 


Price: 12s. 6d. net МГ" 
* 


Tables of the Ordinates and Probability Integral of the 
Distribution of the Correlation Coefficient in Small Samples 
By F.N.DAVID JZ 5. \ 


E A А 
38 pages of Introduction, 55 pages of Tabless/ío Piagtams and 3, Charts 
Е FOT „у T e 
E M 


" м ы 
Biometrika Tables 

The two volumes of Tables for Statisticians and Bi 
issued. ' i > Ж V я 
At the request of the Biometrika Trustees а complete recasting of these Tables has. been undertaken 
by Professor E. S. PEARSON and*Dr H. О. HARTLEY. Many of the old tables have been set aside or 
modified, tables which have been published during the last fifteen years in Biometrika are reproduced and 

some new tables added. МАЛ Be FK | ^^ Y ; ` 
Volume I of the new series, which includes the Statistical and auxiliary ua er pe RE in nue 
common use is now available. It contains an Intfoduction Me 54 эне covering іп all 238 pp. T 
۹ * | ;" " Prices 258. net а сй i ^ wt 


are now out of print and will not be re- 


se 


BIOMETRIKA, 42, 3ad 4 " е Я (i) ~ 


NEW STATISTICAL TABLES: SEPARATES RE-ISSUED 
FROM BIOMETRIKA 


To be obtained from 
BIOMETRIKA OFFICE, UNIVERSITY COLLEGE, LONDON, W.C.1 


1. From Biometrika, Vols. 22, 27 and 28 
Tests of Normality. By E. S. PEARSON and R. C. GEARY 


Price Two Shillings and Sixpence, post free 


11. From Biometrika, Vol. 32, Part 2, pp. 168-181 and 188-189 
(1) Table of percentage points of the incomplete beta-function 
(2) Table of percentage points of the у? distribution 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


Ill. From Biometrika, Vol. 32, Parts 3 and 4, pp. 300-310 
(1) Table of the probability integral of the range in samples from a normal population 
(2) Table of the percentage points of the range 
(3) Table of the percentage points of the t-distribution 


Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


IV. From Biometrika, Vol. 33, Part 1, pp. 73-88 
Table of percentage points of the inverted beta (F) distribution 


With introductory matter. Price Two Shillings and Sixpence, post free 


V. From Biometrika, Vol. 33, Part 3, pp. 252-265 


(1) Table of the probability integral of the mean deviation in samples from a normal 
population 


(2) Table of the percentage points of the mean deviation 
Stitched together with introductory matter. Price Two Shillings and Sixpence, post free 


VI. From Biometrika, Vol. 33, Part 4, pp. 296-304 
Table for testing the homogeneity of a set of estimated variances 
With introductory matter. Price Two Shillings, post free 


VII... From Biometrika, Vol. 35, Parts 1 and 2, рр. 145-156 


Table of significance levels for the Fisher-Yates test of significance in 2x2 contingency ` 
tables. By D. J. FINNEY 


With introductory matter. Price Two Shillings and Sixpence, post free 


Vill. From Biometrika, Vol. 35, Parts 1 and 2, pp. 191-201 


Table for the calculation of working prpbits and weights in probit analysis. By D. J. FINNEY 
and W. L. STEVENS 


With introductory matter. Price Two Shillings and Sixpence, post free 
IX. From Biometrika, Vol, 36, Parts 3 and 4, pp. 267-289 
Tables of autoregressive series. By M. G. KENDALL 
With introductory matter, Price Two Shillings and Sixpence, post free 
X. From Biometrika, Vol. 36, Parts 3 and 4, pp. 431—449 
Tables of symmetric functions, Part 1. By Е. N. DAVID and M. G. KENDALL 
With introductory matter, Price Two Shillings and Sixpence, post free 


* 


(ii) 


NEW STATISTICAL TABLES: continued 


Xil. From Biometrika, Vol. 37, pp. 168-172 and рр. 313-325 
(1) Table of the probability integral of the t-distribution 


(2) Table of the x* integral, and of the cumulative Poisson distribution. By Н. О. HARTLEY 
and E. S. PEARSON 


Stitched together with introductory matter. Price Five Shillings, post free 


Xill. From Biometrika, Vol. 38, Parts 1 and 2, pp. 112-130 


Charts of the power function for analysis of variance tests, derived from the non-central 
F-distribution. By E. S. PEARSON and H. O. HARTLEY 


With introductory matter. Price Two Shillings and Sixpence, post free 


XIV. From Biometrika, Vol. 38, Parts 3 and 4, pp. 435-462 
Tables of symmetric functions. Parts 11 and 111. By F. №. DAVID and М. С. KENDALL 
With introductory matter. Price Four Shillings, post free 
XV. From Biometrika, Vol. 38, Parts 3 and 4, pp. 423-426 
A chart for the incomplete beta-function and the cumulative binomial distribution. By H. O. 
HARTLEY and E. R. FITCH , 
With introductory matter and ruler scale. Price Two Shillings and Sixpence, post free 
XVI. From Biometrika, Vol. 40, Parts 1 and 2, pp. 70-73 
Tables of the angular transformation. By W. L. STEVENS 
With introductory matter. Price One Shilling, post free 


XVII. From Biometrika, Vol. 40, Parts 1 and 2, pp. 74-86 
Tests of significance in a 2x2 contingency table: extension of Finney's table (No. Vil). 
Computed by R. LATSCHA 
With introductory matter. Price Two Shillings and Sixpence, post free 


XVIII. From Biometrika, Vol. 40, Parts 3 and 4, pp. 427-446 
Tables of symmetric functions. Part IV. By F. N. DAVID and M. G. KENDALL 
With introductory matter. Price Four Shillings, post free 


XIX. From Biometrika, Vol. 41, Parts 1 and 2, pp. 253-260 


Tables of generalized k-statistics. By S. Н. ABDEL-ATY 
With introductory matter. Price Two Shillings, post free 


XX. From Biometrika, Vol. 42, Parts 1 and 2, pp. 223-242 
Tables of symmetric functions. Part V. By F. N. DAVID and M. G. KENDALL 
With introductory matter. Price Four Shillings, post free 


No. Xl-is out of print. 
. 


+ 


Biometrika Index 


A Biometrika Index comprising Subject Index for Volumes 1-37 and 
Author Index for Volumes 1—40 is now available 


Price: 6s. net or $1.00 


To be obtained from 
Y COLLEGE, LONDON, W.C.1 


BIOMETRIKA OFFICE, UNIVERSIT 


(ii) 


BIOMETRIKA PUBLICATIONS 


Issued by the Cambridge University Press, Bentley House, London, N.W.1 
and obtainable from any bookseller 


The Life, Letters and Labours of Francis Galton, Vols. I, II, ША, & Шв 
By KARL PEARSON, F.R.S. Price £3. 3s. net 


Karl Pearson: An Appreciation of Some Aspects of his Life and Work 
Ву Е. SS PEARSON Price 10s. 6d. net 


A Bibliography of the Statistical and Other Writings of Karl Pearson 
Compiled Бу G. M. MORANT, with the assistance of B. L. WELCH Price 6s. net 


* Student's" Collected Papers Edited by E. S. PEARSON and JOHN WISHART 
with a Foreword by LAUNCE McMULLEN Price 15s. net 


Karl Pearson’s Early Statistical Papers 

Reprinted by photo-lithography for the Biometrika Trust, with the permission of the original publishers. The 

Volume contains eleven papers, including the more important of the memoirs entitled “ Mathematical Contri- 

butions to the Theory of Evolution”, first published in the Philosophical Transactions of the Royal Society. The 

original paper deriving the y?-distribution, published in 1900 in the Philosophical Magazine, is also included. 
Price 21s. net 


ROYAL STATISTICAL SOCIETY 


THE JOURNAL OF THE ROYAL STATISTICAL SOCIETY is published in two series: SERIES A 
(GENERAL), four issues a year, 15s. each part, annual subscription £3. 1s. post free; SERIES B 
(METHODOLOGICAL), two issues a year, 22s. 6d. each part, annual subscription 45s. 6d. post 
free. 


SERIES A (GENERAL), VOL. 118, PART III, 1955 
A Statistical Examination of the Megalithic Sites in Britain. By A. Тном (with Discussion)—A Method of 
Labour Turnover Analysis. By K. F. LANE and J. E. ANDREW (with Discussion)—The Statistical Analysis 
of Congestion. By D. R. Cox—Agricultural Income. By J. R. BELLERBY—The Scotch Whisky Industry. 


By Ross WiLson—Reviews of Books, Statistical and Current Notes, Additions to Library, Periodical 
Returns. 


SERIES B (METHODOLOGICAL), VOL. 17, No. 1, 1955 

Permutation Theory in the Derivation of Robust Criteria and the Study of Departures from Assumption. 
By С. E. P. Box and S. 1; ANDERSEN (with Discussion)—Some Problems in the Statistical Analysis of 
Epidemic Data. By NORMAN T. J. BAILEY (with Discussion)—Statistical Methods and Scientific Induction. 
By SIR RONALD FisHer—Piyotal Quantities for Wishart's and Related Distributions, and a Paradox in 
Fiducial Theory. By J. б. MAULDON— Confidence Intervals for the Parameter of a Distribution Admitting 
a Sufficient Statistic when the Range Depends on the Parameter. By V. S. HUZURBAZAR— Least Squares 
Regression Analysis for Trend-Reduced Time Series, By С. Н. JowErr—A Numerical Investigation of 
Least Squares Regression Involving Trend-Reduced Markoff Series. By J. F. Scorr and V, J. SMALL— 
A Sampling Experiment on the Powers of the Records Tests for Trend in a Time Series. By F. С. Foster 
and D. TEIcHROEW— Moments of Negative Order and Ratio-Statistics. By Н. A. Davin—A Rectifying 
Inspection Plan. By Zivia S. WURTELE. 


ROYAL STATISTICAL SOCIBTY, 21 BENTINCK STREET, LONDON, W.1 


(iv) 


BIOMETRICS 


Journal of the Biometric Society 


Vor. 11, No. 3 TABLE OF CONTENTS SEPTEMBER, 1955 


Experimental Design in Industry. Н. C. HAMAKER—The Exploration and Exploitation of Response 
Surfaces: An Example of the Link between the Fitted Surface and the Basic Mechanism of the System. 
С. E. P. Box and P. V. Youte—Design and Analysis of Two Phase Experiments. G. A. McINTYRE— 
The Bradley-Terry Probability Model and Preference Tasting. N. T. GrinGeman—Sur la Determination 
de l'Axe d'un Nuage Rectiligne de Points. GEORGES Tetsster—The Variance of the Genetic Correlation 
Coefficient. E. C. R. Reeve—Tests for Linear Trends in Proportions and Frequencies, P. ARMITAGE— 
An Example of the Truncated Poisson Distribution. D. J, FINNEY and С. C. VARLEY. 


Vor. 11, No. 4 DECEMBER, 1955 


Matrices in Quantal Analysis. P. J. CLARINGBOLD—Covariance Analysis as an Alternative to Stratifica- 
tion in the Control of Gradients. A. D. OurHwarre and A. RurHERFORD—Partially Replicated Latin 
Squares. W. J. YOUDEN and J. S. HUNTER—A Method of Analysis for a Double Classification Arranged in 
a Triangular Table. Nett DrrCcHBURNE— The Relative Size of the Inter- and Intra-Block Error in an 
Incomplete Block Design. W. A. THOMPSON, JR.—Appropriate Scores in Bio-Assays using Death-Times 
and Survivor Symptoms. JoHANNES IPSEN—On the Analysis of Variance of a Two-Way Classification with 
Unequal Sub-Class Numbers. C. Y. KRAMER—A Note on Design and Analysis of Soil Insecticide 
Experiments. P. SPRENT. 


—————— جس‎ 
Annual subscription rates to non-members are as follows: For American Statistical Association Members, 
$4.00; for subscribers, non-members of either American Statistical Association or the Biometric Society, 
$7.00. Subscriptions should be sent to the 
е MANAGING EDITOR, BIOMETRICS 
P.O. BOX 5457, RALEIGH, NORTH CAROLINA, U.S.A. 


TRABAJOS DE ESTADISTICA 


REVIEW PUBLISHED BY INSTITUTO DE INVESTIGACIONES ESTADISTICAS 
OF THE CONSEJO SUPERIOR DE INVESTIGACIONES CIENTIFICAS 


MADRID, SPAIN 


CONTENTS 
Vol. VI Cuad. I 


S. Ríos-— Problemas de máximos y mínimos relacionados con la inferencia en poblaciones finitas. 

B. M. BENNETT—The cumulants of a Sample mean from a finite population of first N integers. 

Notas. M 

Е, Azorin—Sobrevision por muestreo en las Universidades españolas. 

К. A. FisHER—La Estadística. 

E. Garcia Espafia—Notas para la enseñanza de algunos conceptos е 
el Bachillerato. 

CRONICAS. BIBLIOGRAFIA. CUESTIONES. 


lementales de Estadística en 


ion wi ipti i Sixto 
ing i tion with works, exchanges and subscription, write to Professor 
Rice ТОШ de ей ас Estadisticas del Consejo Superior de Investigaciones Cientificas 
(Serrano 123) Madrid, Spain. The Review is composed of three fascicles published three gue 
а year (about 350 pages), and its annual price is 80 pesetas for Spain and South America an 
$3.00 for all other countries. $ { 


Annals of Human Genetics 


Formerly ANNALS OF EUGENICS 
Edited by L.S. PENROSE 


Vol. 20, Pt. 1 CONTENTS August, 1955 


J. N. WALTON—On the inheritance of muscular dystrophy. With a note on the blood groups by К. К. 
Race, and a note on colour vision and linkage studies by U. Philip. Н. Katmus—The familial distribu- 
tion of congenital tritanopia. Н. HARRIS, О. MrrrwocH, E. B. ROBSON and Е. L. WARREN—Phenotypes 
and genotypes in cystinuria. REVIEWS. 


Vol. 20, Pt. 2 October, 1955 


R. CEPPELLINI, M. SiNiSCALCO and С. A. B. SmitH—The estimation of gene frequencies in a random- 
mating population. NEWTON E. MoRTON—Non-randomness in consanguineous marriage. NEWTON E. 
MoRroN—The inheritance of human birth weight. D. Е. ROBERTS, ELIZABETH W. IKIN and A. Е. 
MouRANT—Blood groups of the Northern Nilotes. D. Hrwrrr, J. W. WEBB and A. M. STEWART— 
A note on the occurrence of single-sex sibships. SARAH B. Horr—Genetics of dermal ridges: frequency 
distributions of total finger ridge-count. REVIEWS. 


The Editor regrets that owing to recent increases in the cost of production it is necessary to raise the 
subscription price to 65s. net per volume of four quarterly parts (in U.S.A. $11.00) post free. Single issues 
17s. 6d. (in U.S.A. $3.00) postage extra. 


CAMBRIDGE UNIVERSITY PRESS 
BENTLEY HOUSE, 200 EUSTON ROAD, LONDON, N.W. 1 


The Annals of Mathematical Statistics 


The Official Journal of the Institute of Mathematical Statistics 


VOL. 26, NO. 4 CONTENTS DECEMBER, 1955 


On the Asymptotic Behavior of Decision Procedures. JACK LADERMAN—Estimation of the Mean and Standard 
Deviation by Order Statistics. Part III. A. E. SARHAN— The Moments of the Senge Median. JOHN T. CHU and 
HAROLD HoTELLING—Maximum Liklihood Estimates of Monotone Parameters. Н. D. BRUNK—Error Estimates 
for Certain Probability Limit Theorems. J. M. SHArIRO—Decision Rules, Based on the Distance, for Problems 
of Fit, Two Samples, and Estimation. KAMEO Matustra—An Empirical Distribution Function for Sampling with 
Incomplete Information. MIRIAM Ayer, Н. D. Brunk, G. M. EWING, W. T. REID and EDWARD SILVERMAN— 
On the Noncentral Beta-Distribution. J. L. HODGES, JR.—On Transient Markov Processes with a Countable 
Number of States and Stationary Transition Probabilities. DAVID BLACKWELL—On Parameter Estimation for 
'Truncated Pearson реш Distributions. GEORGE С. DEN BROEDER, JR.—Rotation Sampling. ALBERT К. 
EcKLER—Multi-Level tinuous Sampling Plans. GERALD J. LIEBERMAN and HERBERT SOLOMON—Joint 
Distributions of Time Intervals for the Occurrence of Successive Accidents in a Generalized Polya Scheme. 
GRACE E. BATEs—On the Ratio of Variances іп the Mixed Incomplete Block Model. W. A. THOMPSON, JR.— 
A Generalization of a Preliminary Testing Procedure for Pooling Data. D. V. HUNTSBERGER—On the Selection 
of n Primary Sampling Units from a Stratum Structure (n > 2). A. К. SEN—Balanced Incomplete Block Designs 
and Tactical Configurations. D. А. SPRorr—NorEs: Further Remark on the Maximum Number of Constraints of 
an Orthogonal Array. ESTHER SEIDEN—A Theorem on Convex Sets with Applications. S. SHERMAN—ABSTRACTS 
OF PAPERS—NEWS AND NOTICES—REPORT OF THE BERKELEY MEETING—REPORT OF THE ANN ARBOR MEETING— 
PUBLICATIONS RECEIVED—INSTITUTIONAL MEMBERS, 


Subscription rate $12.00 per year in the United States and Canada and $10.00 per year elsewhere 


ADDRESS ORDERS FOR SUBSCRIPTIONS AND BACK NUMBERS TO 
PROFESSOR GEORGE E. NICHOLSON, Jr., Secretary, INSTITUTE OF 
MATHEMATICAL STATISTICS, DEPARTMENT OF STATISTICS, 
UNIVERSITY OF NORTH CAROLINA, CHAPEL HILL, NORTH CAROLINA 


(vi) 


ECONOMETRICA 


JOURNAL OF THE ECONOMETRIC SOCIETY 
Contents of Vol. 23, No. 4, October 1955, include : 


CHARLES F. Roos, Survey of Economic Forecasting Techniques 

KrN-Icut INADA. Note on the Non-existence of the Social Welfare Function 

I ps Кон and JOHN К. Meyer. Correlation and Regression Estimates when the Data are 
atios 

Martin SHUBIK. A Comparison of Treatments of a Duopoly Problem (Part IT) 

ALMARIN PHILLIPS. The Stability of Technical Coefficients 

C. E. V. LESER. Production Functions and British Coal Mining 

FRANÇOIS MORIN. Note on an Inventory Problem Discussed by Modigliani and Hohn 

E. BunGER. On Extrema with Side Conditions 

Book REVIEWS, NOTES AND ANNOUNCEMENTS 


Published Quarterly Subscription rate available on request 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics. 

Subscriptions to Econometrica and inquiries about the work of the Society and the procedure 
in applying for membership should be addressed to 


RICHARD RUGGLES, Secretary 
THE ECONOMETRIC SOCIETY, BOX 1264, YALE UNIVERSITY 
NEW HAVEN, CONNECTICUT, U.S.A. 


SANKHYA 


THE INDIAN JOURNAL OF STATISTICS 
EpiTED BY Р. C. MAHALANOBIS 


Vol. 15, part 3, 1955 CONTENTS 
biased Estimation. Part П. By E. L. LEHMANN and Н. SCHEFFÉ 


Comple ‚ Similar Regions, and Un 
опро к istributions. By V. М. DANDEKAR 


Certain Modified Forms of Binomial and Poisson D 

A Note on = Structure of a Stochastic Model considered by V. M. Dandekar. By D. Basu 

Analysis of Dispersion for Multiply Classified Data with Unequal Numbers in Cells. By C. R. RAO 

Approximate Probability Values for Observed Number of ‘Successes’ from Statistically Independent 
Binomial Events with Unequal Probabilities. By JOHN E. WALSH ) 

Tables of Two-Sided 5% and 1% Control Limits for Individual Observations of the rth Order. By 


MOTOSABURO MASUYAMA 2 um 
Modified Mean Square Successive Difference with an Exact Distribution. By A. К. KAMAT 
Estimation and Tests of Significance of the Components of a Time-Series. By О. SURYANARAYANA 
A Note on a Form of Tchebycheff’s Inequality for two or more Variables. By D. N. LAL 
Unbiased Test for a Specified Value of the Parameter in the Non-Central F Distributions. By N. 


MARAKATHAVALLI 
BACK NUMBERS 
ENT 
SUBSCRIPTION CURR ee pad 
I ' ЕЕ 6 зге 
NDIA Д 15. ү 
FOREIGN Е 


Subscriptions and orders for back numbers should be sent to 
Statistical Publishing Society, 204/1 Barrackpore Trunk Road, Calcutta — 35 


` (vii) 


AMERICAN STATISTICAL ASSOCIATION 
1108 16th St., N.W. Washington 6, D.C. 


JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 


CONTENTS 
VOL. 50, NO. 271 SEPTEMBER 1955 


Editorial Collaborators 
Articles: 
H. C. Hamaker and R. Van Strik. The Efficiency of Double Sampling for Attributes 
Daniel O. Price. Examination of Two Sources of Error in the Estimation of Net Internal 
Migration 
S. S. Zarkovic. Sampling Methods in the Yugoslav 1953 Census of Population 
A. Ross Eckler, Gertrude Bancroft and Robert Pearl. Concepts Employed in Labor Force 
Measurements and uses of Labor Force Data 
A. Clifford Cohen, Jr. Restriction and Selection in Samples from Bivariate Normal Distributions 
William O. Jones. Colonial Social Accounting 
M. A. Adelman. Federal Trade Commission Report on Changes in Concentration in Manu- 
facturing 
Robert Ferber. On the Reliability of Responses Secured in Samples Surveys 
Benjamin Epstein. Comparison of Some Non-Parametric Tests Against Normal Alternatives 
with an Application to Life Testing 
Frederic M. Lord. Estimation of Parameters from Incomplete Data 
Howard L. Jones. The Application of Sampling Procedures to Business Operations 
W. L. Stevens. Estimation of the Brazilian Coffee Harvest by Sampling Survey 
Morris A. Copeland. Statistics and Objective Economics 
John Aitchison. On the Distribution of a Positive Variable having a Discrete Probability Mass 
at the Origin 
Abbott S. Weinstein. Increasing the Effective Length of Short Time-Series for the Purpose of 
Estimating Autoregressive Parameters 
Edwin L. Crow. Generality of Confidence Intervals for a Regression Function 
Oscar Kempthorne. The Randomization Theory of Experimental Inference 
M. M. Babbar. Distributions of Solutions of a Set of Linear Equations (with an application to 
linear programming) 
Morris H. Hansen, William N. Hurwitz, Harold Nisselson and Joseph Steinberg. The Redesign 
of the Census Current Population Survey 
Paul R. Rider. Truncated Binomial and Negative Binomial Distributions 
М. A. El-Badry and Р. Р. Stephan. On Adjusting Samples Tabulations to Census Counts 
Н. J. Godwin. On Generalizations of Tchebycheff’s Inequality 
Statistical Abstracts 
Book Reviews 


Publications Received 


THE AMERICAN STATISTICAL ASSOCIATION 
INVITES AS MEMBERS ALL PERSONS INTERESTED IN: 


1. Development of new theory and method. 


2. Improvement of basic statistical data, 
A $ А 3. Application of statistical methods to practical problems. 


(viii) 


осме 42, Parts 3 AND 4 Decemarn 1955 


POPULATION ESTIMATION BASED ON CHANGE OF COMPOSITION 
CAUSED BY A SELECTIVE REMOVAL 


By DOUGLAS G. CHAPMAN* 
Department of Mathematics, University of Washington and Readership in the 
Design and Analysis of Scientific Experiment, Oxford University 


1. INTRODUCTION 


It has been noted in wildlife studies that it is possible to estimate the size of an animal 
population on the basis of the change of sex ratio following a kill concentrated on one sex. 
` Itis difficult to ascertain to whom this idea should be credited, and what name should be 
given to it. Scattergood (1954), in a general survey, lists several references to the method. 
Chapman (1954) called the procedure the dichotomy method of estimation and showed that 
the usual ‘intuitive’ estimates are also maximum-likelihood estimates. It is not clear that 
this title is sufficiently descriptive, and in any case the classification does not necessarily 
need to be dichotomous. On the other hand, change of composition does seem to reflect the 
essential aspect of the procedure. 
The assumptions involved in this method appear to be simpler and more likely to be 
lled than those underlying other population estimation procedures. Moreover, it will 
be shown that non-fulfilment of some of the assumptions may involve no serious con- 
sequences. This is one of the phases of the method studied in this paper. 

To be balanced against this is the fact that the amount of information yielded is smaller 
for the same effort than is obtainable from the more usual method of estimation based on 
the recapture of marked members of the population. A quantitative comparison is given 
below of the two methods. Some problems of optimum design in the sampling that accom- 
panies a change of composition estimate are considered and some extensions of the pro- 
cedure studied. f 


2. BINOMIAL MODEL 

The simplest situation arises if a ‘closed’ population is assumed, i.e. one where there is 
no emigration or immigration and births and deaths are negligible over the period of the 
_ experiment—except for the removal process. As to the removal process, all that is required 
of it is that the final or cumulative total be known and a breakdown into at least two classes 
or components. The third assumption for the simple binomial model is that random samples 
have been taken of the population before and after the removal process. The randomness 
is in respect to the specified classes or components. 
The following notation is needed: 


N,- population size at time t; (i= 0,1) made up of two 


Unknown parameters: | classes X and Y, 
X, Y = size of classes X and Y at times t;. 


а Fellow of the Guggenheim Foundation. 
19 


It is convenient to write Р, = X,/N, (i = 0,1): 

n, = size of random samples taken at times f, 
R, = X,— X, = removal from class X, 

R, = Y,-Y, = removal from class Y, 

R = R,+ R, = total removal. 


Known parameters: 


Random variables: ж, y, = the number of classes X and Y respectively in sample i 
(i = 0,1). 


Analogous to P,, we define p; = z;/n, (i = 0, 1). 
As noted in Chapman (1954) the maximum-likelihood estimates of Х and N, are 


fe _ mı Rz- 2, R) = 20Ё. Р.В) (1) 
Ny Zq— fig, Po-Pi 
IA rcs пот R, —2, В) pt R,—p,R (2) 
: My 29 — "21 Po-Pi ` 
while the asymptotic variances of X, and М, are 
o) = Sep DÀ 
m-a F 
(0) = Xh | Xi 
SE, PIER. 5 (4) 
(= Р)? 


In the usage made of this method in wildlife studies it is N, that is ordinarily estimated, 
based as noted above, on the change of sex ratio. The sports kill records show the removal 
of each sex. In many cases the kill is confined to males; this represents a very favourable 
situation. It should be pointed out that under the assumptions made it is immaterial 
whether attention is focused on time f, or time f. 

In the possible application of this procedure to the estimation of fish populations, it 
must be recognized the removal is not always selective by sex. However, here the classi- 
fication may be by size or by species. In many fisheries there is a size restriction on fish 
that may be retained by the fishery. If the biologist can sample in a manner to include 
randomly other sizes, then this change of composition estimation procedure may be used. 
In this case, or in case the classification is by species, the experimenter will usually be more 
interested in estimating Xo. 

It may be suggested that where X and Y represent qualitatively different populations; 
one should expect different catchabilities and consequently the procedure based on such а 
classification would be unsatisfactory. However, in the rather important case that R, = 0, 
the estimate remains asymptotically unbiased even if the catchability of the X population 
is not the same as that of the Y population (for the experimenter or sampler). We repeat 
that the catchabilities associated with the removal process are irrelevant. 

For consider that X and Y have catchabilities A, and A, and assume the total sample 
catch has a Poisson distribution. Then given n;, the sample catch size, z;, has a binomial 
distribution with parameters э, X;/(N; — Үд), where à = 1—,/A,. 


280 Population estimation based on change of composition 1 


Dovaras G. CHAPMAN 281 


Then replacing the random variables in X, by their expectations (which they will con- 
verge to in probability as the sample sizes are increased), we have 


LXX B, X, R-H,R, + 3R, Ry) < 
ЎА Х.К), e 
and the result stated follows immediately. 

However, N, does not have the same desirable property, for replacing zy, y, by their 
expectation under the model of unequal catchabilities 


å (NR, — X,R - 8Y,R, +88, R,) 

No = QUTD 5 9 Ея ио у 7 
Nevertheless, the fact that X, is still estimable indicates that it may be possible to use а 
quite different species in this estimation procedure, e.g. а scrap fish may form the Y class 
while the desirable sports fish is the X class. It should be pointed out that the distribution 
of the estimate will be modified, in case A, + А„, even though R, = 0. In particular, in this 
case the asymptotie variance of X, is 


1-8 [4X4 Р xy Р 7 
ed) = Ea a= Pat my а= "i 


The variation of o%(X,) with respect to 8 depends on all of the unknown parameters. 
However, it is possible to state some useful qualitative results. "These can be determined 
simply from a study of the function 

/@) = 0-8-3 (<1), 
which attains its maximum at à = 2— 1/Р,. This maximum is (4P,(1—P)}*> 1. 
The qualitative results are summarized in the following table: 


Py, P<} a, (0) «exi, (0)) | or &, (8) > оң, (0) 
Py, Py>4 e£, )8((< e, (0)) | оҷ, (0) «o*&, (0) 


The favourable cases occur іп the main diagonal; they аге favourable in the sense that if 
ox o (0)) is used to estimate the variance or determine à confidence interval, any errors 
incurred will be on the conservative side. In terms of a working rule for the sample con- 
sidered, if the undesirable fish are abundant relative to the desirable fish, then the procedure 
is more satisfactory if the sampling device is such that the desirable fish are more en 
Since /(8) achieves a maximum value of unity at д = 0 for FP, =}, it rae aro E 
suggest that when the Р, straddle $ and do not differ greatly from it, o*(X, (8)) wil 


little from o*(X,(0)). 


One other possibility of error in this procedure идите kr 
may not be true totals removed from the po de shal idl vie as the unknown 
the removal will be under-evaluated because of the unrepo: T 


natural mortality. 19-2 


282 Population estimation based on change of composition 


Suppose that the actual removals аге R, + A,, R,+A,, R+A, where A,+A, = А and 
where we may write A, = KR,, A= KR +e. 

Then for э > со, the estimate N, converges in probability to 

NR,-X,R-eR, 
N(R KD (IF+E)-ekd ` 9 

It is convenient to denote the term in the brackets of (8) by 1--5. X, also converges in 
probability, under these conditions, to X,(1--5). 

Consider first the case where € — 0, i.e. the unknown removal of X and Y is in the same 
proportion as the known removal. Then 


1+6 = (1+ K)?! x1— К for К small; 


hence the relative asymptotic bias is negative and of the order of the unknown proportion 
of the removal. 
For the more important case where є is not zero we restrict attention to the situation 


where R, = 0. Then yeu 

1+b= 2 x: 

Y,(1+ K)— в 
If the mortality or unknown removal of the X’s апа Y's is proportional to the size of these 
th 

БАК ат А, = AX, = Y,KR|X, 
апа A=KRN |X» є=,КЕ|Х,. 
Hence = —KR/X. 


In this situation the relative bias is still negative and of an even smaller order than in the 
case where є = 0. 

In the usage of (4) (or (3)) to estimate the large sample variances, in a situation where 
part of the removal is unknown to the experimenter, errors will arise in the incorrect 
estimation of the X;, Y;. It is easily seen that, while on the average, for sufficiently large 
samples, Xp, Y, are underestimated, X, and Y, will be overestimated and consequently the 
estimation errors will compensate. From a qualitative point of view, therefore, it appears 
that the errors due to bias in the estimation of the variances are of much smaller con- 
sequence than the sampling errors. 

The cases considered above are by no means the only possibilities that might occur with 
a portion of the removal unknown to the experimenter. They represent, however, two cases 
of practical importance. Of course if no assumptions whatsoever are made as to the size 
and relative magnitudes of A, and A,, it is clear that they may be such as to vitiate the 
estimation procedure. 


3. OPTIMUM SAMPLE ALLOCATION 


Since this method of population estimation involves two samples over which the experi- 
menter has some degree of control, it is natural to inquire as to the optimum disposition of 
effort between these samples. If N, is to be estimated with o%(N,) to be minimized, subject 
to the restriction that n, +n be fixed, then elementary calculus leads to the following 
equation for the ratio of n, to лу: " | XY " 


Doveras С. CHAPMAN 283 


Formula (9) involves parameters which are unknown at the outset of the experiment. 
It is possible to use it to suggest qualitative rules of procedure. Consider the most favourable 
case where the removal is restricted entirely to one class, say the X class, i.e. R, = 0, Then 


1 Ry. 
m= ml =y 3 


the ratio n,/n, is given in Table 1 for various values of R/X,, the proportion of the X class 
removed. Since the removal will rarely exceed 50 %, of the removed class, this table suggests 
that if N, alone is to be estimated, n, should be chosen only slightly smaller than nọ. 


Table 1. Optimum sample ratio т то for the estimation of N 
in case there is no removal in the Y class 


R/Xo 0-5 0-4 0:3 0-2 0-1 0-05 0-02 0-01 
ning 0-71 0-77 0-84 0-89 0:95 0-975 0-990 0-995 


If X, is to be estimated, the situation is slightly more complicated, but a qualitative rule 
is possible. Recalling o(X,) from formula (3), this is minimized with n, +n, fixed if 


mH (eg An SE). (10) 
no \Xoh) РЬ, NM è 
Again, consider the case Y, = Y, and hence В, = R. Then the optimum choice of ng and 


n, is determined by ^ R haai 
TEE 


which is shown in Table 2. 


Table 2. Optimum sample ratio n;|ng for the estimate of X 
in case there is no removal in the Y class 


at in a few extreme cases the second sample should be considerably 
timate of Xo; for most cases, however, the samples 
Since often the biologist will wish to know both 
— n, represents a near optimum allocation of 


Table 2 suggests thi 
larger than the first for an optimum es 
should be chosen almost equal in size. 
X, and Nj, it would seem that choosing % 
effort. 


284 Population estimation based on change of composition 


4. CoMPARISON OF CAPTURE-RECAPTURE AND CHANGE 
OF COMPOSITION ESTIMATION PROCEDURES 


Rather than merely sampling and classifying as X or Y n, +n, animals, the experimenter 
may tag or mark an equivalent number. The removal process will then act as a sample to 
estimate the tag ratio. It is perhaps reasonable to disregard the additional effort necessary 
to record accurately the tag recoveries. However, it is not reasonable to disregard the fact 
that tagging may be an operation that is more expensive, or requires more effort than 
classification. 

In view of the result of the last section it is reasonable to consider the case n, = n, = n 
(say). For simplicity we restrict consideration to R, = 0, А, = R. Denote R/N, by r. Then 


М Ү[Х,+Х,—Е|[1—т\? 
aia Ге). А 
_ 2Р,—т te 
ара Е 

Now consider the estimate that could be made if 2An tags were placed in the popula- 
tion, where A7! is the ratio of the cost of tagging to the cost of classification and hence may 
be assumed to be greater than or equal to 1. 

The random tag recoveries follow a hypergeometric distribution under the usual assump- 
tion of random sampling without replacement; this hypergeometric distribution is often 
approximated by a Poisson distribution (e.g. Chapman, 1948). Here, if » is considered to 
be small relative to N, while R is not necessarily so, then it may be shown by the usual 
procedures (using Stirling’s formula for the large factorials appearing in the hypergeo- 
metric formula) that the distribution tends in the limit, as N >co, while n/N —0 and R/N 
is bounded away from zero, to the binomial distribution with parameters (n, R/N). If this 
limiting distribution is used as an approximation to the true hypergeometric distribution, 
the asymptotic variance of the estimate of N, based on the tag recoveries (say и o(t)) is 
NW(N—R) N$(1-r) 


АЕ ^ — Amt (13) 


o*(N,(t)) = 


Hence we may ask under what conditions on А, r and Р, is 


1 (2Р,-т\ (1—r 
n^) Cz) a3 
This inequality holds provided 
` r+2rA(1—r) 
HS 4A(1—r)«r' (14) 
+ 2rA(1— 
Let (0 fex 2 (=n 2. (15) 


We observe that f(r,0) = 1, while f(r, 1) is a monotone function of r with f(0, 1) = 0, 
f (1, 1) = 1. The first statement in this sentence is the trivial remark that the change of com- 
position method must always be better if tagging is prohibitively costly. The second case, 
where tagging is no more costly than classification, is of some interest; we consider this 
case in the next paragraph. 


Dovaras С. CHAPMAN 285 


Elementary considerations show that for r« 1, P,<f(r,1) implies FP <r. Since the Y 
removal is zero, it is impossible for the proportion of the population removed to exceed the 
original proportion of X's. Consequently, if tagging is no more costly than classification, 
from the large sample point of view, the tag sample procedure is under all circumstances 
more efficient than the change of composition estimation method. 

Turning to the case A > 1, we have used a different method of comparison. Tabled below 
is the minimum value of 1/A such that o*(No(t)) > o3(N,) for specified values of P. 


Table 3. Minimum value of relative cost of tagging to cost of 
classification such that a*(N «(t)) > a*CN o) 


Since the comparison of these two procedures is made under the conditions most favour- 
able to the change of composition technique, these results suggest that in almost all cases 
the capture-recapture estimation procedure will yield more information for the same 
amount of effort. However, it is to be noted that the capture-recapture estimates are 
seriously dependent on the assumption that the initial capture does not alter the behaviour 
pattern of the animals in any way that will affect their probability of recapture. This 
assumption is difficult to test and it is certainly not always reasonable. : 


5. ESTIMATION WHEN R, AND Ry MUST BE ESTIMATED 


In the field, many additional complications may arise in the application of the change of 
composition estimation procedure. Often R is known reasonably vell, but R, and К, are 
not; it may even happen that R itself must be estimated. These complications do not change 
the estimates but will modify the estimated variances and any interval estimates. 

We study the case where Ё is known but R, and R, must be estimated from a sample of 
size m from the removed group. We assume that a sample of size m is taken randomly, and 
that of the m animals, m, are from class X, my from Y. Then the estimates of X, and Mo, 
denoted by X, No, are simply (1) and (2) with R, replaced by R,, where 


m,R 16 
pins (16) 


286 Population estimation based on change of composition 

While the asymptotic variances of X, and N may be determined by inverting a 3x3 
information matrix, it is easier to compute them directly, for X, and N; are linear functions - 
of А,. Thus 


P497. 5 dA В.В, 1-Р) ERCP 
en | Ee зал ato ы. 


T is t asymptotic variance, asymptotic in the sense that terms of NE order in — 
ng", nz} are neglected. + 

A similar formula can be derived for *(X,), and the same method canbe followed through 
in the case that Ё itself has to be estimated in some manner. Tt is of course assumed that ће 
estimation of R is independent of the random Marc с. To xı. The assumption is implicit 
in the development above. . 


` 


6. COMBINATION OF CAPTURE-RECAPTURE AND CHANGE OF COMPOSITION ESTIMATES 


It was noted in §4 that, if the experimenter is prepared to accept the assumptions under- 
lying the capture-recapture estimates, then he will gain more from putting his effortinto 
tagging animals rather than merely classifying them. Actually, however, he may try to do 
both, for in this way he may get two different estimates, or the combined’ procedure may 
yield the most efficient single estimate. Moreover, the biologist will often wish to sample the 
population on | more. than one occasion, in order to study growth or other changes in the 
population. Consequently if the members of the initial sample are tagged as the basis of — 
a tag-sample estimate, the normal pattern of the study of the population will make possible _ 
also the change of composition | 9 (ming of course a selective removal between < 
the samples) | 

In this situation the first problem that. оъ the experimenter is to determine 
whether there is reasonable agreement between the two possible estimates. As noted above 
in § 3, the binomial model with parameters (ny, R/ N,) is a reasonable approximation to repre- 
sent the capture-recapture procedure. Since the large-sample variances are known for the 
change of composition estimate and for the estimate based on the number of tags recovered 
(cf. formula (12)), à large-sample test is available to compare the two separate estimates. 

If the estimates are not compatible it may be due to failure in the assumptions in one 

«or both models. These assumptions may be. correct in regard to: expectations, but the 
sources of variability may be greater than the ‘models permit, i.e. the distributions are not 
binomial. This possibility will be considered in the next section. 

On the other hand, there may be sources of error, such as were discussed in § 2, or are 
well known for capture-recapture estimates. These have been particularly well pointed out 
by De Lury (1954). It is pertinent to note that the most important sources of error tend to 
bias the change of composition estimate downward, while the important factors of trap- 
shyness and tag mortality will tend to inflate the tag sample estimates upwards. 

1f the estimates are compatible, there is a suggestion but certainly no proof that the 
assumptions oft the two models are fulfilled. In this case it will be normal practice to com- | 
bine them. They may be averaged, weighting them inversely to their asymptotic variances. 
However, for this simple model the maximum-likelihood equations are easily solved. We | 
need to add to the notation of § 2 the following: 


„42 m es т 


s = number of the n, animals taken in sample 0 
which are recovered in the removal process. 


Dovaras С. CHAPMAN 287 


It is assumed of course that these n, animals have been made identifiable by tagging or 
marking in some manner, Then аз: 


ema ҮТ o 


The maximum-likelihood equations for X, and М, are seen to be of the same form as those 


of the simple model of $ 2. They may be written in the form ' : е 
ا‎ ^ te ny nya +s 
کک وک‎ = - 
ТЭ ИРИК ca Y 
Уо 9m uo Баш ъй | © (20) 


7, AX ON 
and solved by analogy with those of the simple model. The maximum-likelihood estimators 


are 


go[ (n4 — no +s) Ё, – 2. К] | 
Miu I r9 SES 
5 (my — 1 8) to = 20g, " pm 
2 nol (na — no+ 8) R —a,R] | 
fe 2nd — no+ s) Ra -21 R] as 
: $7 (y= ng 5) ts Moz "s 
The information matrix is m j- 1 


LES SER 
LL Xx XE NIS 
b t (23) 
зт мх, т! 
MY NN: NS AM Ё ^ 
£ * ` LR I . ‚ & 
where 4 "DS E. M 24 
с, Ату, T: 
А КЕ AX (Р, By R -1 
Hence ptotic) с? Ne) = 2 EE . А 25 
| г" A (N6) ingen: sis| (25) | 
"o ny 


An analysis of (25) shows that the optimum allocation of n; and л, is made by choosing 
ту the maximum possible, n, the minimum. This, of course, is necessarily true from the - 
results obtained in $4, if the differential cost of tagging is disregarded. Consequently the - 
combined procedure outlined here will be used if there is reason to question the assumptions 
of the tag-sample procedure, or if further sampling after the removal process will be made 
for other reasons. In this case the question of optimum allocation does not enter. — 


7. EXTENSIONS FROM THE SIMPLE BINOMIAL MODEL 


There are several immediate extensions or generalizations of the model considered in § 2. 
samples with one removal or several removals with one 
pair of samples, or several samples and several removals. The first two of these extensions 
are trivial; the several selective removals between two random samples may simply be 


combined by addition and the origínal model is obtained. If there are several samples either. 


before or after a single removal, they may be combined in the usual manner that samples 


pertaining to the binomial model are combined. 


For example, there may be several 


288 Population estimation based on change of composition 


Where there are several selective removals, each followed by a random sample, a new 
situation arises. This is of some interest, for the sampling process might be combined with 
the removal process to Jead to such a situation, i.e. the sampler might only return the 
members of the Y group to the population (though in general it is to be expected that the 
experimenter will prefer to return both X’s and Y’s marked or tagged). 

The following additional notation is needed: 


» Ra: removal of X population prior to ith sample, 
Ry: removal of Y population prior to ith sample, 

R,=R,,+R,, (i =0,1,2,...,%). 
Also X; = X,— Б, 


zo M= № Ry; , ete. 
The maximum-likelihood equations for X, and N, may be written down simply as 
kg Ет 
2x, 00 (26) 
k Eon 
ў 502 узе, 


These equations can be solved by iterative methods in any particular case. In case the 
R,,, Е, are small relative to Xo, Yọ, the following approximate solution may be employed 
to yield estimates without the labour of the iterative procedure. The logarithm of the likeli- 
hood function may be written 


C S annt Xon ( -x) 


i-0 


0, 
Raw An R, 
+ K+ Šan R- РЕ да Synd- -ву- р + (27) 


so that the likelihood equations become 


А k ay А А k 8 
(1 -Р)У з; Б +Р,У, y; В„—Ре(1 -P)x nR; = 0, (28) 


k k k 

Ez X Rr, 2 Vi 2 yi y, 

po CE DR A i on 0. (29) 
Py NoP§ ER (1—Р,) No 

P о may be determined from the quadratic equation (28) and this in turn substituted in (29). 
The result is a simple linear equation for Ў 0. 

The possibility mentioned earlier that the selective removal might be performed by the 
sampler suggests a further possibility—to set up a sequential procedure for sampling, 
removing the X group in the sample from the population and so on for k steps where / is 
determined by the actual observations. This would be analogous to the sequential procedure 
proposed by Goodman (1953) for the tag sample method of estimation. 

Perhaps more important than any of these extensions is to consider the situation where 
sampling that confirms to the binomial model is impossible. Schooling and segregation by 
size or sex or age will certainly be frequently encountered in many natural populations, 80 
that factors beyond the control of the experimenter make invalid the assumptions of the 


DovGLAs б. CHAPMAN 289 


binomial model. This will be true even when he is cognizant of the principles of random 
sampling. Since these additional factors tend to produce more variability from sample to 
sample than is ascribable to the binomial model, it is necessary for the experimenter to 
replicate his samples to obtain an internal measure of this variability. 

Consider then several samples n (i = 0, 1; j = 1,2,..., Ё), each composed of з, members 
of the X class and y;; members of the Y class. To estimate X, N, a regression technique may 
be used that has some resemblance to De Lury's procedure for population estimates from. 
the catch per unit of effort data (De Lury, 1947). To do so it is necessary to postulate that 
the random variables p;; = z;j/n;; are approximately normal with mean Р, and variance ai. 


Now define Zo = Xo — Poj No» | (30) 
Zy Х,—В,—рҖ,-В). 
Then &(Z;;) =0, (Zy) = Nor (Zy) = (№ В)? оз. (31) 


If the underlying heterogeneity is considerable it is reasonable to assume that oj, = o for 
all i, j; though if the т; differ considerably some note should be taken of this, presumably 
following the ideas of Cochran (1942). Also for simplification the factors №, (N— Е)? in 
the variances of the Z;;’s are neglected. The maximum-likelihood equations for X, and М, 
then reduce to a form almost identical with (1) and (2), provided the single observations 
there are replaced by means. 


k 
È Pi 
Thus letting p;. =E 
i 
a R,—Rp, 
RN e (32) 
e ED. 


while by routine methods it is found that 


4 с [NS (Һ-Ё 
(Asymptotic) o*(No) = (P,— Py e+ T y . (33) 
It should be pointed out that (33) is а large sample variance if Ko, Е, are large, not no, îı 
as was the case with respect to formulas (3) and (4). ( 
Formula (33) may be used to estimate the magnitude of the sampling required for а 
given removal, or the magnitude of the removal for given sampling, to achieve an estimate 
with preassigned precision. Thus if Ry = 0, k = k, = k (say). 


^ 1 2 f2—2r+r 
(Asymptotio) oS) = E pg [ж]. (34) 


Unfortunately, it will often be true in advance that so little is known ав to the possible 
range of values for с, that this formula cannot be used even to give qualitative results. 
This procedure can be extended to the case of several selective removals, with a non- 
selective sample taken between each removal. No new theory is involved although the 
algebra becomes somewhat more tedious. However, such а process must necessarily take 
place over time where mortality and possibly emigration and immigration should be 


considered. It is therefore thought desirable to defer consideration of this case to a later 


` study, where the restriction that the population is closed, is removed. 


It is planned to illustrate the methods developed above in a further paper, using appro- 
priate data concerned with fish populations. 


290 Population estimation based on change of composition 


REFERENCES 


CHAPMAN, D. С. (1948). A mathematical study of confidence limits of salmon populations caleulated 
from sample tag ratios. Bull. Int. Pacif. Salm. Fish. Comm. no. 2, pp. 67-85. 
CHAPMAN, D. С. (1954). The estimation of biological populations. Ann. Math. Statist. 25, 1-15, 


COCHRAN, W. С. (1942). Sampling theory when the units are of unequal sizes. J. Amer. Statist. Ass, 
37, 199-212. 


De Lory, D. B. (1947). On the estimation of biological populations. Biometrics, 3, 145-67. 
De Lury, D. B. (1954). On the assumptions underlying estimates of mobile populations. Statistics 
and Mathematics in Biology, pp. 287-93. Ames, Iowa: Iowa State College Press. 


Goopman, LEO A. (1953). Sequential sampling tagging for population size problems. Ann. Math. 
Statist. 24, 56-69. 


ScarrERGOOD, L. W. (1954). Estimating fish and wildlife: a survey of methods. Statistics and Mathe- 
matics in Biology, pp. 273-85. Ames, Iowa: Iowa State College Press. 


[ 291] 


AN AGE-DEPENDENT BIRTH AND DEATH PROCESS 


By W. A. О'Х. WAUGH 
Royal Military College of Science, Shrivenham 


1. INTRODUCTION 


Many examples of the stochastic processes known as branching or population processes 
have been developed. The assumptions leading to the various models have been decided 
partly by mathematical convenience and partly by the attempt to reproduce features con- 
trolling the development of biological or physical populations. The model we shall develop 
involves a continuous time parameter and an enumerable set of states, and we can compare 
it with several other processes of this type, e.g. the Yule-Furry birth process, Feller's 
birth and death' process (see Feller, 1939, 1950), and the non-Markovian birth process due 
to D. G. Kendall (1948). 

There are two motives for the development of the present model. Mathematically, it 
furnishes a conveniently soluble example of a non-Markovian process, in which transitions 
from a given state may be to one of several other states, and in this it is a generalization of 
Kendall’s process in which transitions are always from the given state to its successor, 
i.e. from state number n to state number n+ 1. For purposes of application it is based on 
assumptions which are not proposed as ideal but which bring in features which are perhaps 
a step in the direction of realism. Thus it may help in the development of yet more realistic 
models when they are required, and in indicating both the scope and the limitations of the 
models constructed under an assumption which has been widely adopted in population 
problems; viz. that the individuals reproduce independently of one another. 

Bellman & Harris (1948, 1952) have described a stochastic process similar to the one we 
shall examine, and we shall adopt their notation. They consider populations of particles 
which reproduce by fission independently of one another. They suppose that a ‘generation 
time’ is defined for these populations which is a random variable having a general distribu- 
tion function @() (which does not depend on the size of the population, in accordance with 
the assumption of independence just mentioned), so that dG(f) expresses the probability 
that a particle born at time t = 0 ends its life in the interval (t, t di). 

They assume that with given probabilities qo. 1: 12: +- any particle whose life has just 


ended is replaced by 0, 1,2, ... new particles. á т 
Their detailed work is carried out for the pure birth process involving binary fission only: 


а= 1, qo = qı = In = «+ = 0. 


They state that their results can easily be generalized to other modes of fission. Note that 


when qo 0 the process is а birth and death process. Further developments in this direction 


have also been obtained by Ramakrishnan (1951) and by Reid (1953). i^ | 

It may be convenient to note here that a focal point of the present investigation will be 
the coefficient of variation of the population size, considered as a function of the time. 
The importance of this function in the study of populations has been emphasized previously, 


e.g. by Kendall (1952a). 


292 An age-dependent birth and death process 


2. AGE-DEPENDENT PROCESS INVOLVING TWO POSSIBLE FATES: DEATH OR FISSION 


For clarity and because it leads to a model which may be useful in applications, we shall 
not adopt, in the main part of our investigation, the most general assumptions that are 
possible. We shall indicate at various points throughout the paper certain further general- 
izations that merely involve difficulties of technique and alterations in detail which might 
be worked out if applications seemed to demand it. 

We shall suppose that at the end of its life an individual either disappears or is replaced’ 
by two new individuals (i.e. death without issue or reproduction by binary fission). These 
two new individuals then develop and reproduce independently of one another in the same 
way as the parent individual, i.e. if the parent individual split at time t to form them, then the 
probability that either of the two new individuals ends its life during (¢ +7,t+7+dr) is 
dG(r), the distribution G(r) being fixed, and the same for all individuals of the population. 
In the notation of Bellman & Harris this would be equivalent to putting 

4+0, 4+0, qı =g; =... = 0, / 

but we shall make the more general assumption that these probabilities depend on the age 
at which the parent individual's life ends, A further generalization to admit the possibility 
of an individual's facing more than two alternative fates is quite simple. We shall suppose 
that at timet = 0 the population consists of just one newly born individual. Let us name the 
alternatives which face an individualof the population; risk (5)is the event that its life ends 
with the birth of two new individuals, risk (d) is the event that it dies without issue. We shall 
derive the functions q,(t) and G(t) as follows. Suppose that there are two functions A(t) and 
v(t) such that, given that an individual born at time 0 is alive at time t, 


Pr (individual succumbs to risk (Б) in (t,14-dt)) = A(t) dt + o(dt), 
Pr {individual succumbs to risk (d) in (t, t+ dt)} = v(t) dt +o(dt), 
thus Pr (individual's life ends in (t, t +dt)} = [A(t) + v(t)] dt + o(dt). 
The function [A(/) + »(t)] might be called the ‘force of mortality’. In the argument which 


follows, and which leads to the fundamental integral equation, it is convenient if A(/) and 
v(t) are both continuous and do not vanish together at any point in 0 < £< co and are such that 


t 
Í [A(u) + v(u)]du->% as t — oo. In the latter part of the paper we assume a certain form for 
0 


A(t) and v(t), and it happens that these requirements of continuity, etc., are fulfilled. How- 
ever, it is possible to extend the argument to cover cases in which certain discontinuities 
are allowed, in particular the case where an instantaneous chain reaction can occur at the 
moment of fission (a possibility which is mentioned by Bellman & Harris (1952)). In this 
case our assumption of age-dependence permits the probabilities governing the numbers 
of progeny at instantaneous fission to be different to those corresponding to fission after 
a finite life length. 

From A(t) and v(t) we obtain the following conditional probabilities, given that the 
individual's life ends in (t, t + dt): 
" 3 zy zu AID) 
Pr (life ends by risk (5)) = q,(t) = AQ») 


Pr (life ends by risk (d)} = g(t) = хрр. 


W. A. ON. Waveu 293 
Let 


t 
G(t) = 1 -exp|- A+ v(u)] du 


Pr {individual born at time 0 lives for a time less than or equal to t). 


We can imagine an idealized population in which either the risk (d) is suspended and the 
population develops under risk (b) alone (v(!) = 0) or vice versa. Then 


t 
B(t) = 1— exp | -[ Alu) dul 

0 
Pr {individual born at time 0 lives a shorter time than t, when risk (d) is suspended}, 


D(t) = 1—exp | -f v(u) du) 


= Pr {individual born at time 0 lives a shorter time than ¢, when risk (b) is suspended}. 


If 00-0, a 20, o = 80, 
ап ви] 

then 900) = Fe 
b(t) [1- D) 

and q(t) = s lic 

and also 190) = [1— B()] [1 — Do]. 


3. THE INTEGRAL EQUATION FOR THE GENERATING FUNCTION OF THE POPULATION SIZE 


In order to obtain the mean and variance of the population size as functions of the time we 
start from an integral equation for the generating function of the distribution of the popula- 
tion size, which is very similar to the one from which Bellman & Harris (1952) derive their 
results. It is possible to give a definition of a suitable sample space Q and a function (0) 
over О which may be called the population size at time t, and by considerations of measur- 
ability to show that (rj: t> 0) is a well-defined stochastic process, and furthermore to derive 
the fundamental integral equation from this definition. However, we will content ourselves 
with this reference to the more rigid approach and will merely give the following intuitive 
argument which leads to the integral equation. 

Let p,(t) be the probability that the population size is r at time t, given that the population 
consisted of a single newly born individual at time 0. 


Let ф(2,1) = У р) a= Ег", 
ja 
© 
and let h(z,t)= У 400 2%. 
n= 


Then the situation at time t may have come about in either of two ways. 
(i) The initial individual may still be alive. 

(ii) The initial individual’s life may have ended in the interval (u, u + du) where 0<u<t. 
In this case, if its life ends in fate (Б), then because of the independence of the members of 
the population and the fact that all individuals are assumed to develop under the same 
probabilities as regards life length and number of progeny, we have effectively two popula- 


294 An age-dependent birth and death process 


tions developing each for a period #— и, from initial ancestors newly born at time u. On the 
other hand, if its life ends in fate (d) then the population size at time t is zero. 
Hence we have the integral equation 


66,0) = jl ш) + ae) 946,1 wg) du + 2[1— б], 


Ы Í пив), uglu) du+2[1- 64), 


the latter form of which can easily be seen to hold when modes of fission other than binary 
are considered. 
In terms of the functions B(t), D(t) and their derivatives this can be written 


(2, t) = f [ġ(z,t— u)]?b(u) [1 — D(u)] du +f d(u) [1 — B(u)]du +2[1 — B(t)] [1 — D(t)]. 


If we differentiate this equation with respect to z and then putz = 1 we get the following 
equation for the mean population size at time f, 


t 
mlt) = efi pt — u)b(u) [1 — D(u)] du + [1— B(t)] [1 — D(t)], 
while if we differentiate twice with respect to z and then putz — 1 we get the equation 


ші) = 2 Í "(t P [1 D(u)] du + 2 t=) b(u) [1 — D(u)] du, 


for the second factorial moment of the population size. 


4. THE OBSERVABLE LIFE-LENGTH DISTRIBUTIONS AND THE PROBABILITY OF EXTINCTION 


Tn experimental observations we should probably be able to separate the data on life length 
into those for individuals ultimately suffering fate (b) and those for individuals ultimately 
suffering fate (d). These observable distributions will be related to the distributions Bit) 
and D(t) in a similar manner to that in which crude rates are related to net rates in a problem 
of competing risks. 

Let B*(t) be the probability that an individual newly born at time 0 suffers fate (b) before 
time t. 'Then B*(oo) is the probability that a newly born individual will ultimately suffer 
fate (b). 4 

Let B*(t |b) be the probability that an individual newly born at time 0 and known 
ultimately to suffer fate (b) has a life length less than or equal to t. : 

Let D*(t), Р*(оо), D*(t | d) be the corresponding probabilities for fate (d). 

The probability that an individual born at time 0 suffers fate (b)in the interval (t, t+ dt) is 


[1— @(4)] A(t) dt + o(dt). 
Hence B*(t) = [п — G(u)] A(u) du, 
0 
and, since B*(t) = B*(oo) B*(t | b), 


t © 
B*(t|b) = T [1—G(u)] A(u) duff [1 — б(и)] A(u) du, 


W. A. O'N. Wavon 295 


with a similar result for D*(t|d). It will be desirable in any attempt to fit the model to 

experimental data to give to B*(t | Б) the form of the observed distribution of life lengths 

for those individuals whose life ultimately terminates in fate (b), and similarly for D*(t | d). 
We can express B*(t) in terms of B(t) and D(t) as follows: 


Since A(t) = b(t)/[1 — B(t)) 
and 1- 6(9 = [1 — B()) (1 — D(0], 
therefore B*(t) = f [1 —- D(u)]b(u) du, 
0 
and similarly D*(t) = f [1— B(u)] d(u) du. 
0 


From the rigorous definition of Z(t) (the population size) as a random variable r,(w) over 
а suitable sample space О, to which we have referred, we can show that 


Роб!) = Pr(Z(t) «0 
is a solution of the integral equation 


м0 = f. (ын) + ae) En 93 gt) du. 


Clearly p(t) must be a bounded solution because we must have 0 < p,(t) € 1, and, because 
Pr(Z(t,) = 0) > Pr (Z(t;) = 0} for t, > t, Polt) is a monotonic increasing function of t. Further- 
more, it can be shown, by а method due to Bellman & Harris, that the bounded solution of 
the integral equation is unique. 

Hence р) + P € 1 as too. 

Furthermore, ре(#) is the sum of an indefinite integral and the convolution of a bounded 
monotonic increasing function with a continuous function. Hence pọ(t) is continuous. 

Let ру) = P — (t) so that ô(t) } 0 as t>o. 

The integral equation can be written 


t 
Pa) | (шин) + Pg) gu) du — 2P f "ааваа du ! PNE) aa) gu) du. 


Note that fi qalu) g(u) du < 1, that ô(t) < 1, and that by Cauchy’s principle of convergence 
0 


8 qalu) glu) du < e, where ¢ is any positive constant, for any t> в when s is sufficiently large. 
8 


Consider the second term in the equation above and divide the range of integration into two 
parts и 
Гондон de | o-an gtu) du 
0 8 


t 
< f'a- u) qalu) g(u) du « qo(u)g(u)du (because 0 ô(t) < 1), 
0 s 
< f 8(@— и) (и) g(u)du+e for sufficiently large s, 
0 
<€ 0 g(u) du +e for sufficiently large t>s, 
0 


<26: 
Biom. 42 


20 


296 An age-dependent birth and death process 


This argument and another similar one show that the last two terms on the right-hand side 
of the above integral equation tend to zero as t+ со, and so we get 


P= fy fao(u) + P°qalu)} g(u) du. 


7 p= | чч}бщй and o= |а) ао, 


t 
then, since by the assumption that Í [A(u) + v(u)] du.— co as t— oo, the function G(t)+1 
0 
at ion P=p+oP?, where p+o=1. 
The roots of this quadratic аге Р = 1, Р = рос. If р > we must therefore have p,(t)>1 
because the limit of p(t) is not greater than 1. 
On the other hand, if p < с, the limit of p(t) might be either р/с or 1. 
Suppose the limit of p(t) is 1. Then, for some t,, 0<t,<00, we must have 


Polt) = 0, where plo «0«1. 
Then, for this value of t, 


0 = [tias nti gudu 


ty 
< [^ tao) + aa} gu) du. 


Now if the upper limit of integration on the right-hand side is increased, the right-hand 
side cannot decrease. Furthermore, if q(t) g(t) or go(t) g(t) is non-zero for t> t,, it must 
increase, if we increase the range of integration to (0, oo). 

demo 0 « p 4- a6?, 
i.e. 060? — 0 4- p » 0. 


But 0 lies between the roots р/с and 1 of the left-hand side, which is a quadratic expression 

that must be negative inside the interval (р/с, 1). Hence we have a contradiction, and 

therefore p(t) cannot take any value in the interval (р/с, 1). s 
On the other hand, if go(t) g(t) and q(t) g(t) vanish identically for t >t, we may have the 

equality 0= p 4-002, 

but this implies 0 = p/o or 0 = 1, which again contradicts the assumption p/o <0 < 1. - 
Hence we see that p(t) ^ min {р/т, 1} and this limit is the probability of extinction. 


5. EXAMPLE: THE SIMPLEST MARKOVIAN BIRTH AND DEATH PROCESS 
To illustrate the methods of the preceding sections we shall obtain the integral equation, 
etc., for the birth and death process in which the probabilities are as follows: 
For a particle alive at time t 
probability to split into 2 new particles during (t, 1-- dt) = Adt +o(dt), 
probability to die during (t, t+ dt) = vdt 4- o(dt), 


where А and v are constants. 


W. А. O'N. Wavan 297 
In this case b(t) = Ae" and d(t) = ve, Hence we obtain 
90) = (A + v) ears, 
Golt) = >|(А+>), qut) = АА +»), 
and the integral equation is 
t 
ф(2,0) = I [v - A9*(z, t — u) eA du + 2 eA, 
° 
For the observable life-length distributions we have 


B*(|5) = [caman] [Acme 
= As eg ot ; 
= Gd), 
and the same result holds for D*(t | d). For the extinction probabilities since 
с = В*(со) = А|(А+>) and p= D*(o) = (А+), 


the probability of extinction is equal to 1 if A<» and is equal to v/A otherwise. These 
particular results are of course all well known. 


6. CHOICE OF A PARTICULAR FORM FOR b(t) AND d(t) 


Although the functions that appear in the basic statement of the birth and death process 
are X(t) and v(t), it is a clearer way of indicating the type of age dependence we are con- 
sidering if we give first the frequency functions of the life-length distributions. In the 
ensuing sections of this paper we shall take 


ЕА) 
07 qs m hi 0—1 640, 
(mv 


m ТЇ pui emi, 


d(t)= 


where A and > are positive constants. | 

Probability distributions of this type have been used before in population problems (see 
Kendall, 19526) and also (much earlier) in the theory of queues (for an account of this see 
Erlang, 1948). A ‘microscopic’ interpretation of them has been given in the birth process, 
where only опе fate is possible for each member of the population, i.e. in the binary fission 
process with no death effect. This interpretation is that at the birth of a new particle a process 
is started within it, which must go through a fixed number, k, of successive distinct phases. 
The times spent in the successive phases are independent random variables, each having 
distribution function 1—e-*™. When the last phase is completed the particle subdivides and 
forms two new particles. It follows that the ‘generation time’ is distributed like Xie (2kA). 
Note that in giving this interpretation we do not intend to imply that if a real population 
is found to have a generation time with this distribution then a process of k phases is 
necessarily going on within the individuals. Our reason for adopting this form of distribution 
of generation time is that it appears to give a fairly good fit in certain populations (see, for 
example, Kendall (1948) and recent experimental results by Dr E. O. Powell (1955), and à 
real physical multiphase process may or may not be held to account for this. 


20-2 
. 
. 


298 An age-dependent birth and death process 


We can extend this multi-phase interpretation to our model if we consider that at the 
birth of a new particle two such processes start within it, one being of k phases and the other 
of m phases. We suppose that the fate of the particle is decided as follows: if the k-phase 
process (which we might call ‘maturity’) is completed first, the particle undergoes fate (b) 
at the moment of completion, if the m-phase process (which we might call * decay") is com- 
plated first, the particle undergoes fate (d). 

An important property of the above distributions is that as the ‘number of phases’ 
k (or m) tends to infinity the distributions tend towards the corresponding deterministie 
distributions, while for all (or m) they have the fixed mean value 1/A (or 1/v). 

It follows from the above choice of b(t) and d(t) that 


Bie iere. +E], 
» Ё. (mvtyn-t 
D(t)=1-e mot mvt + mom 
kA (kAt)*-1 
AH= Н 
(k— 1)! (ХАР) 
LERA EAE т 
mv (mvtyn 
TEE a 
UT —1)! 


Making use of properties of the above forms of B(t) and D(t) we shall express each integral 
equation in terms of convolutions of functions, and then reduce it to a linear differential 
equation with constant coefficients. The subsequent solution depends on solving the 
characteristic equation explicitly, and hence the equation for the mean population size. 

In the next two sections we shall suppose that > 1, but that m = 1. Roughly speaking 
this assumption means that births occur in an age-dependent manner but deaths occur at 
random. Because a complete solution is available and because the assumption may well be 
realistic in some applications, we give the explicit solution for the mean and derive the 
variance and coefficient of variation of the population size from it. We can compare our 
results (i) with Kendall’s (1948) uniform multi-phase pure birth process by putting > = 0, 
and (ii) with Feller's Markovian birth and death process by putting k = m = 1. This leads, 
of course, to the simple form of this process in which the birth and death rates do not depend 
on the population size. 

When the number of phases in both B(!) and D(t) exceeds 1 an explicit solution of the 
characteristic equation is not readily obtainable, so we give asymptotic formulae for the 
mean and variance of the population size in terms of the dominant root of the characteristi¢ 
equation. 

We shall find the following notation convenient. 

Let (a) = ae êt, 


Let fire = f(t)* (a). 


Thus ate“ = (ж) ж (x). We shall write (<) ж (a) = (x)**, and, extending this analogy 
with powers, 


acp 6s = (x) ж (0) * ... # (a) = (а)*". 


W. А. O'N. WavoH 299 
Let t x^ CO» t А 
Simca ene fora 


-le (a)*" 


be the distribution function of 1x1, /, where 1 is the function that is constant and equal to 
unity for all t in —co«t« oo. 

Let КА 4- mv = y. 

In this notation we can express the functions which occur in the integral equations ав 
follows: (kA) 


&(t)1— DO] = Gay 


a «< (mv)n-1 
е le Vui uA ЫЙ 77. 


sen 


- (ap Dia sterne a p » 
(т»)т-1 k(k--1)...(k--m—2) ym? 
(тй у=  (Ё+т-2)! 


Qr time 


(e k(k -- е i m+2) (yy m=) 1 


tk+m—2 e-yt | 


y 
Similarly ap (1 BO) = (7) [mer > reme. 


EA E! m(m + 1) ... (m 4- k — 2) 
*(7) T= lp 


The observable life-length distributions corresponding to probabilities of the type we 
have adopted are given by 


В*@) = Apo mirets.. 


(em 1)... (E4-m - 2) 


(y) ees] ^ 


X(k--m—1) 
y, (m Th 1)! l* (у) | , 
whence В*(оо) follows on noting that 1 * (y)***? tends to 1 as t>o for j = 0, 1, no m-l. 
Hence we can see that the distribution function B*(t| b), and similarly D*(t |4), is the sum 
of constant multiples of several distribution functions each of y? form, i.e. it is a mixture of 
X? variables (see Robbins, 1948). 
The probability of extinction is equal to 1 if 


er» +(e ا‎ > 


1 
quc д (m-1! Sp: 


Y 
Note that when Ё = m this is equivalent to >A. slants a anche 
The expression above consists of the first m terms of the negative binomial distribution 
whose generating function is p*(1 — qs) when we put p= kàly and q =m] у, and thus ш 
given condition is equivalent to the statement that the median of this negative binomi 
distribution shall be greater than or equal to m. 


300 An age-dependent birth and death process 


7. THE CASE OF RANDOM DEATH: MEAN POPULATION SIZE 
In this case, putting m = 1, we have 


клу 
b(t) [1 — D(t)] = (— ek. 
on-do = (о 

= a Lm KAE vei 

200-297 [oT o (9) v]. 

The corresponding observable life-length distributions are obtained as follows: 

x (cs) ye oye (со) = A" 

B*(t) = 5) l*(y)", whence B*(oo) = б) 


апа во B*(t|b) = 1» (y)**, 


which is of just the same form as B(t) with the parameter у = kA + у replacing kA. The effect 
of deaths occurring at random, on the observable birth rate, is merely to alter the constant 
in the distribution without altering the form of the distribution whereby the dependence 
of the birth-rate on the age of the parent is expressed. 


f 


D*(t) = rfen ioe zs 4G «o». 


andso D*(t|d) = АЕ - GT [i+ ema 5 (y) +... +2)" 1x | . 


Thus, as might be expected, although the actual death-rate is independent of the age of the 
individuals the effect of the age-dependent birth-rate is to produce an observable death-rate 
which is age-dependent. 

к 4 
The probability of extinction is equal to 1 if (9) < A i.e. if v> k(2V* — 1) A, while if 
v < k(2V* — 1) A, then 


k k 
Probability of extinction — h - (=) | / Е 


ly» 
= (i15) -l 


In the case of random death the integral equation for the mean population size may be 
written in the ‘convolution’ notation as 


mi) = a (Bier С "|, 


where we have put 1—@(t) = 1—1 * g(t). 
Let us define a differential operator, 
ld 


so that 91700) * (y)] = f). 


W. A. O'N. WavGH 301 


Put 2 = 0 in the integral equation and in the relations resulting from applying the 
operators д, 6%, ..., 8®-1 to it, This gives 


whence it follows that 
dM ^ 
(5) «| = Gon. 
1-0 


While ô* gives the differential equation of the mean 
КА\* kA v kA [2 *-1 
ôr (t) = (=) t a-() eu | 
palt) y p(t) 7) ТУТУ y 
The constant terms sum to zero so we get 


(0 team x о =0. 


The characteristic equation for the above differential equation is 


TEE 


If о is the primitive kth root of unity this gives 

х = WkkAwi—y, where j—0,1,...,k— 1. 
These roots lie on a circle in the complex plane, centre — y, radius 2“*kA. The root of largest 
real ра ж = ЖА у = (Ak — 1) — v. 
To obtain an explicit solution for the mean population size it is convenient to consider the 
mean population size which results when there is no death effect (v—0). In this case the 


system reduces to the uniform multi-phase birth process described by Kendall (1948, 19526). 
If we call the mean population size y(t) when > = 0 we see that it satisfies the equation 


1 d\* ^ 
KR AE 1) = 0, 
(rcx) juo 
together with the end conditions 
Д 4 
(0) =1 and (3) (0) =0 for ј = 1,2,...,-1. 


The solution of this system is 
]k-i дї 
I0 = э, X su —1 
© ( ГА Atyr* ( АЕ)" ( kAt)re+k-1 | $ 
FENE [гни 1) 


r=0 


exp [k(2*wî — 1) Ar] 


302 An age-dependent birth and death process 
It follows that if we put pt) = et nF (0), 


then x(t) will satisfy the differential equation, together with the end-conditions, that we 
have derived for it above. 

The value of the mean population size for large t depends on whether the dominant root 
of the characteristic equation, i.e. the real root (2U:— 1) kA—v is less than, equal to, or 
greater than 0. We shall consider these cases in turn. 

(i) >> &Җ21#— 1) A. The mean population size tends to zero like е, where «is a positive 
constant, and it follows from this or from the value of В*(со) that the probability of 
extinction is 1. 

(ii) > = k(2"*—1)A. The probability of extinction is again equal to 1 but we obtain a 
finite limiting mean population size 

lk 
/y(t)>C,, where О, = guy 


k 
(ii) v « k(2!*— 1) A. The probability of extinction is equal to ( + i А) —1 <1. The mean 


population size will increase exponentially and we can write the following asymptotic 
expression for it: 


ок 
It) ~ EI) exp [((2V* — 1) kà — v} t]. 


8. 'ТНЕ CASE OF RANDOM DEATH: COEFFICIENT OF VARIATION OF THE POPULATION SIZE 


Using the convolution notation we may express the integral equation connecting the mean 
with the second factorial moment of the population size as follows: 


k k 
ut) = (2) ıl?» с) +2) ut) (y)**. 


By applying the differential operator д as in the case of the equation for the mean this 


reduces to 
(r5) - (7) e0 = (БУ) mo, 


i.e. multiplying through by y* and factorizing the operator 


k-1 
TH (fj - vr cr) nth = 20271 (0. 
As with the mean, the behaviour of the second factorial moment as > co depends on whether 
the dominant root of the characteristic equation is negative, zero, or positive, and we shall 
examine these cases in turn. 

(i) и> k(27*—1)A. All terms in the solution for the second factorial moment will tend 
to zero faster than e“, where x is some positive constant. 

(ii) v = (27 —1) A. The term {y,(t)}* on the right-hand side of the equation tends to 
the finite limit C; as tco, while the equation is 


d k-1 (d 
ài ll H — (2! — 1) A+ v| (t) = 2(kA)* {u (6). 


W. A. ON. Wavau 303 
Hence for large t we shall have an asymptotic solution: 


2(kA 
(v — (ка — 1) A} 
1 


r= 
2(kA)* 
= HAs yi 
Hence the variance tends to infinity, but does so to the order of t rather than e“ and in this 
the behaviour of this age-dependent system resembles that of the simple Markovian birth 
and death process when the parameters A and v in it are equal. 

Note that as Ё-> oo we have the condition > = A log, 2 x 0-69315А for a finite limiting mean 
population size. 

(iii) «< &(2V* — 1) A. The dominant term in the complementary function will be of the 
order of exp[(k(2V*—1)A—»])t], but the particular integral will contain a term in 
exp [{k(2"* — 1) A — vj 2t] which will therefore give the asymptotic value of the complete 
solution. Formally we can write the particular integral as 


2(kA)* 


=a RE {4(4}, 
(re) -20 
2(kA)* 
and hence Halt) ~ пн oe nt 


Let У, (2) be the coefficient of variation of the population size. Then 
pat) , 1 
عش‎ + —_—], 
UP unt) 
and if we introduce the asymptotic values for the mean and second factorial moment, we 
obtain for the coefficient of variation for large t 


7.0) = 


2 
Vi = lime peg) —— Mss: 
t>o (me 1 -a —9 


2 


= 
a > 
EE 


where a, = k(24k — 1). : 

When k-+0o, the process tends to resemble one in which deaths occur at random but 
fission occurs deterministically when the parent individual attains a certain fixed age equal 
to 1/A. The limit as > со of the above coefficient of variation, which is itself a limit for large 
t, is thus of interest. Noting that о. — log, 2 and that 3 < e € 1, with equality only when 


v = 0 we obtain the limit PM 
jm 0D = a) 

If V is the coefficient of variation of the distribution B(t), then V, - 1 [Ak and so V, IV. = V, Jk 

will have a finite limit only if V, — 0, as k> 0, i.e. ifv = 0. This point will be of significance 

if any attempt is made to estimate V, from observations of V,, the latter being easier to 

observe, for instance, in bacterial populations. 


304 An age-dependent birth and death process 


9. BrRTH- AND DEATH-RATES BOTH AGE-DEPENDENT 


We have seen that when Ё> 1 and т> 1 the instantaneous probabilities of death and of 
fission both depend on the age of the individual considered. We will first investigate the 
behaviour of the mean population size under these conditions. The integral equation for 
the mean population size is 


k 
ne) = 2) mon [or MEE ren. 


à er k(k+ = zum m—2) учн 
+ [1— B(t)] [1 — D(t)]. / 
Now [1— B())] [1 — D(1)] =e" P(t), 
where Р@) is a polynomial of degree kK +m— 2 in t. Hence 
дента — BOJ- Б] = ( 1+ =a) m exP)s0. 
y dt 


Thus, if we apply the operator 4*+”— to the above integral equation we obtain the following 
homogeneous differential equation for the mean population size 


(2) u(t) = prn H (2) jma + mi gmt, 
к з... 


The characteristic equation is F(x) = 0 and if we put y = 1 +2/y this is 


ET | зс TE аА (Ву а 1)...(й+т- 2) N- 
Н(у) = у (= y vx 7" Tec Т CO лл. 0 
Tt does not seem as easy in this сазе аз it was їп the case of random death (m= 1) to obtain an 
explicit solution of the characteristic equation. We will obtain an asymptotic solution for 
the mean population size, and will consider only the case when the probability of extinction 
is less than 1. The condition for the probability of extinction to be less than 1 is p«c and 
this is the same as 1 — 2B*(co) < 0, i.e. H(1) < 0. Since H(y) оо as y — со it follows that the 
characteristic equation H(y) = 0 must have at least one real root greater than 1. There is 
just one change of sign in H(y) so by Descartes’ rule of signs it follows that H(y) = 0 has 
a unique real positive root, say ш, which is simple and greater than 1. : 

We can show that w is larger than the real part of any other root of H(y) = 0. Letz=u+iv 
be any complex root. We can ignore real roots and complex roots with w«0. We have, 
rearranging H(z) = 0 

(=) 1 mvk 1 (yee m2) 1 | d 
abi ute Uy Tol t y (m-1)! дат ^ ^ 


whence, taking moduli, 


kA\F( 1 mvk 1 mv mL k(k 4- 1)... (+m — 2) 1 
G (apt ipe) (т—1)! [spen]?! 


W. A. O'N. Waven 305 
Now if z is complex, v+0, and so | z | > u. The left-hand side, considered аз a polynomial 
(in | z |71) is strictly increasing for any argument greater than 0, because all coefficients are 
positive, so if we substitute u" for | z |-* we can drop the sign of equality. Thus 
(P k(1 mvk 1 (Sj agents 1 


у) ety ie (m—1)! и 
kA\F( 1 mvk 1 mr" k(k+1)...(k+m—-2) 1 
>1 = (5) atrigat) = area ES || ЫЗЫ Femi 


whence м > u. 

It follows that F(x) = 0 has a root r which is unique, simple, real and positive and larger 
than the real part of any other root of F(z) = 0. Hence we have, as the asymptotic expression 
for the mean population size 

Mt) ~ cet, 
where c is a constant which could in theory be evaluated, though to do so we should require 
the other roots of the characteristic equation, and the appropriate end-conditions. How- 
ever, it is not essential to evaluate this constant, because it will disappear from the limiting 
form of the coefficient of variation of the population size. 

To write down the differential equation for the second factorial moment of the population 
size we note that 


beg rere teg" 


(Ej Beet unc 


Y. (m —1)! 


The integral equation for the second factorial moment is similar to the one which arose 
in the case of random death, but it contains more terms. i 
Tt can be reduced to a differential equation in just the same way, by applying the operator 


óz (1 +14) ‚ and the result is 


нано (esa) rn 


We can see from the solution of the equation for the mean that the complementary function 
for this equation will have a dominant term of the order of e, where r is the same dominant 
root of the characteristic equation. However, owing to the appearance of {,(t)}* on the 
right-hand side we see that the particular integral will contain a term of the order of c 
which will be the dominant term in the complete solution. Evaluating this term we obtain 


1+ 2r [yet — F(27) 
Halt) ~~ OH Col ll 


Note that the denominator F(2r) + 0. УЗЕ МАЙ 
It follows that the coefficient of variation of the population size for large t is given by 


1+ 2r [yy 
Vi = lim Va) = а, 


In conclusion I should like to thank Mr Р. G. Kendall for his continued help and 
encouragement throughout the preparation of this paper. 


306 An age-dependent birth and death process 


REFERENCES 


BELLMAN, В. & Harris, T. Е. (1948). On the theory of age-dependent stochastic branching processes, 
Proc. Nat. Acad. Sci., Wash., 34, 601—4. 

Bettman, К. & HARRIS, T. E. (1952). On age-dependent binary branching processes. Ann. Math, 
55, 280-95. 
ERLANG, A. К. (1948) The Life and Works of A. К. Erlang. Edited by E. Brockmeyer, Н. L. Hal. 
strom and Arne Jensen. Copenhagen: Trans. Danish Acad. Tech. Sciences, 1948, No. 2, 
FELLER, W. (1939). Die Grundlagen der Volterraschen Theorie des Kampfes ums Dasein in wahr- 
scheinlichkeitstheoretischer Behandlung. Acta biotheor., Leiden, 5, 11—40. 

FELLER, W. (1950). An Introduction to Probability Theory and its Applications. New York: John 
Wiley and Sons, Inc. 

Harris, T. E. (1951). Some mathematical models for branching processes. Proc. 2nd Berkeley Sym- 
posium on Mathematical Statistics and Probability, pp. 305-28. 

KENDALL, D. б. (1948). On the role of variable generation time in the development of a stochastic 
birth process. Biometrika, 35, 316-30. : 

KENDALL, D. G. (1952a). On the choice of a mathematical model to represent normal bacterial growth. 
J. R. Statist. Soc. B, 14, 41-4. 

KENDALL, D. С. (19525). Les processus stochastiques de croissance en biologie. Ann. Inst. Poincaré, 
13, 43-108. 

PowELL, E. О. (1955). Some features of the generation times of individual bacteria. Biometrika, 
42, 16-44. 

RAMAKRISHNAN, ALLADI (1951). Some simple stochastic processes. J. R. Statist. Soc. В, 13, 131-40. 

Rem, A. T. (1953). An age-dependent stochastic model of population growth. Bull. Math. Biophys. 
15, 361-5. 

ROBBINS, Н. (1948). Mixture of distributions. Ann. Math. Statist. 19, 360-69. 


[ 307 ] 


A LARGE-SAMPLE BIOASSAY DESIGN WITH RANDOM DOSES 
AND UNCERTAIN CONCENTRATION* 


By FRED C. ANDREWS лхр HERMAN CHERNOFF 
The University of Nebraska and Stanford University 


1. THE PROBLEM 


The problem discussed here is that of designing an experiment for the purpose of estimating 
a parameter in a dose-response curve when the doses administered cannot be known with 
exactness. 

To introduce the problem let us consider the following extreme example. A biologist has 
developed a strain of bacteria. He believes that this strain is so virulent that a dose of one 
organism applied to a test animal will lead to a response with a probability of the order of 
magnitude of 0-2. He wishes to estimate the virulence of this strain of bacteria more 
precisely. For experimentation he has available thirty test animals and 10 ml. of material 
containing this strain of bacteria in suspension. The concentration of bacterial organisms in 
this material is about four organisms per millilitre, but this latter number might be un- 
reliable. The immediate problem confronting the biologist is one of design. Since he is not 
sure of the concentration he must use a portion of his material for a plate count (Clifton, 
1950, pp. 243—4) to estimate the concentration. Then he must allocate portions of his 
remaining material among the test animals to determine virulence. The amount of material 
to be used in the plate count cannot be too large, for then the remainder will not be an 
adequate amount of dose material. Some balancing allocation between these two parts of 
the experiment must be reached or the entire experiment will not yield its maximum 
amount of information. 

For the sake of the reader who prefers to peek at the last page of a mystery story, the 
optimal design corresponding to the biologist’s problem (when completely formulated) is 
the following. To determine concentration 3-1 ml. of material are used. The remaining 
6-9 ml. are divided equally among the thirty test animals. { 1 

Aecording to this design about twenty-eight organisms are divided among thirty animals. 
It is obvious that not all of the animals are assured of receiving organisms. In fact, the 
dosage is random, and some animals might receive none, some might receive one and still 


others might receive two or more organisms. This points out that in our problem exact doses 


are not known for two reasons, First, the concentration is at best an estimate based on the 


limited amount of material assayed. Secondly, even if the concentration was known, the 
exact dose administered to an animal is a random variable which the biologist cannot 
observe directly. This second situation will be referred to as dosage subject to error. Dosage 
subject to error occurs frequently in practice but is seldom treated in theory (see Haley, 


1 i he doses are small. 
1953), although such theory is especially relevant when tl 
The problem facing the biologist can be formulated briefly by stating that the fractions 


for fis fas --- Љо must be determined with the understanding that f, is the fraction of the 


the U.S. Army, Navy and Air Force through the Joint Services 


Mere Applied Mathematics and Statistics. 


Advisory Committee for Research Groups in 


' 


308 A large-sample bioassay design 


material to be used in the plate count and f; the part to be used to dose the ith test animal, 
Of course these fractions are to be determined in such a manner that an ‘optimal’ design 
for estimating the dose response curve will result. | 


2. MODEL 


In order to formulate completely this design problem it is necessary to specify carefully 
the nature of the probability distributions involved. І 

First, the dose-response curve must be specified. In this paper, we shall treat only the 
case where our biologist has reason to believe that if d organisms are administered to a test 
animal the probability of no response is given by 


Ра = (1-ay*. (1) 
The parameter æ represents the probability that a dose of one organism will lead to a positive 


response. This dose-response curve is said to be exponential because the dose d appears in 
the exponent. This formula may also be written 


Фа = eloge A—a), (2) 


This model has previously been discussed in the literature by Goldberg & Watkins (1952), 
Druett (1952) and Peto (1953), and in a different context by Cochran (1950). 

Secondly, we must consider the distribution of the number of organisms appearing in а 
certain amount of the material. Frequently the law of small numbers is assumed in this 
situation (see Worcester, 1954) yielding: Z f the available material is taken from a source 
where the concentration is c organisms per millilitre, the number of organisms x appearing ina 
sample of r millimetres is a Poisson random variable with mean cr. That is, 

P(r-i)- eo (ory (= 0,1,2,...). (3) 
Furthermore, if several samples are taken from the source (without replacement) the number of 
organisms in these samples are independently distributed. 

As an example, suppose that one-half of the tube of 10 ml. were used in the plate count. 
Suppose also that the concentration of the source was exactly 4 organisms per millilitre. 
Then the number of organisms in the part plated would be a Poisson random variable with 
mean (expectation) 4.10.4 = 20. When the dose d is a Poisson random variable with 
expectation D, we shall say that D is the nominal dose. Then 


e? рі 
a! 


P(d=i) = (i = 0,1,2,...). (4) 

We might repeat here that in our problem the biologist does not know the concentration 
and must use the plate count to estimate the concentration of the source or equivalently the 
expected number of organisms in the test tube. Let A be a notation for the expected number 


of organisms in the test-tube. If then the biologist uses a fraction f of his material as a dose, 
the nominal dose is fA. 


3. THE EFFECT OF ERROR IN DOSE 


As was pointed out in the introduction, the exact number of organisms in a dose is a random 
variable. Ordinarily, as in the case with the probit or logit models, the randomness of the 
dose very seriously complicates the mathematical situation, as is quite evident from reading 


FRED C. ANDREWS AND HERMAN CHERNOFF 309 


Haley (1953). It is, however, fortunate that with the use of the exponential model this 
complication is minimized. 

Suppose that a nominal dose D is applied, then the probability of a negative response is 
given by © 
Р. = PELLI —a) = eD, 


Contrast this with the probability of a negative response when the exact dose is d, namely, 
ра = (1—2)? = gloea-ald, 
It is fundamental here to see that the response curve for a nominal dose is also exponential 


in form. The only difference in the two curves is that log, (1—«) in the exact dose curve is 
replaced by — in the nominal dose curve. (For small æ, log, (1 — 2) is almost equal to — a.) 


4. THE NATURE OF AN ASYMPTOTICALLY LOCALLY OPTIMAL DESIGN. AN EXAMPLE 


The problem we have posed in $ 1 yields a solution which might be termed a large-sample 
locally optimal design. To illustrate what we mean by these terms, we shall use a simpler 
problem which has already been treated. 

Suppose that it is desired to dose a large number of test animals with organisms from a 
material with known concentration and that there is an unlimited supply of organisms 
available. This problem reduces then to selecting nominal doses D,, Da, D,, ... to administer 
to the animals. Once such nominal doses are selected and observations obtained, one may 
apply maximum-likelihood or some other asymptotically efficient estimation technique. 
It is known (though not trivial) how to apply the maximum-likelihood method of estimation 
and how to obtain the asymptotic variance of the resulting estimate of a. By an asymptotic- 
ally optimal design we mean one which minimizes the asymptotic variance. It turns out 
for the exponential model that the asymptotie variance is minimized when all the doses 
are the same and these are equal to 

1:6 
Р = —. 
a 

Here the local character of the problem becomes evident. That is, the asymptotically 
optimal design to estimate a depends on æ. But it is clear that if a were known, there would 
be no point to the experiment. This paradoxical situation is resolved by the following 
consideration. In practice, the biologist will have a rough idea of the order of magnitude 
of a. Suppose he felt he could guess « of the order of magnitude of 0-2, then he could use the 
dose 1:6/0-2 in place of 1-6/ж. It can easily be shown that the use of the wrong dose level 
increases the asymptotic variance of the estimate of a. However, this increase is very 
gradual. If we measure the dose in terms of percentage of the asymptotically optimal dose 
and the corresponding variance similarly we obtain the following table: 


Dose/optimal dose Variance/optimal variance 


310 A large-sample bioassay design 


It is clear that if the biologist has any reasonable educated guess about œ he could use the 
design which would be optimal if his guess were the correct value of æ. If, however, his best 
guess might be off by a factor of three or more, alternative procedures would have to be 
considered. For example, he might use a small preliminary experiment involving several 
dose levels which are very highly spread out. Using the estimate of « based on this pre- 
liminary experiment he could then apply the locally optimal design. 


The gradual increase in variance divided by optimal variance shows not only the applie- - 


ability of the locally optimal design. It can also be used to show that designs where dose] 
optimal dose takes on several values from 0-75 to 1-25 are almost optimal. While such 
designs lead to more complexity in the calculation of maximum-likelihood estimates they 
are useful to biologists who have reservations about the applicability of the exponential 
model to their problem. 

5. RESULTS 


As шау have been expected, the large-sample optimal solution to our design problem in- 
volving an assay is a local solution. In fact, it will be shown in the appendix that the solution 
has the following characteristics: 

(1) Every test animal is given an equal fraction of the test-tube. According to the dictates 
of the mathematical model there is no advantage in limiting the number of animals used to 
allow larger individual doses. Of course costs of obtaining individual animals would fore- 
stall the use of an unlimited number of animals in any one experiment. 


(2) The fraction of the test-tube administered to test animals is given by f? which isa — 


function of « (probability that one organism will cause a positive reaction), A (the expected 
number of organisms in the test tube), and в (the number of test animals available), 

This functional dependence is rather complicated, but 9 is very well approximated by 
В* which is obtained from the following simple formula: 


1 1-68 
c f* = mini .ل ل‎ (5) 
p? minimum (7, sa) 
Hence if the numbers given in the statement were relevant we would have 
1 1-68 
PET! &0-2, в= 30, — —- 250-69, — x06, 8% 0-69. 
0, a $= 30, TIRO, TAG, ftu 


A remark about how //* was obtained might be pertinent here. If a very large num ber of 
organisms were available it would be desirable to give each animal a dose of about. 16/4 
and to use the large amount remaining for the plate count. Then the fraction of the test-tube 
used for doses would be about 1-68/(2А). However, if there are very few organisms available, 
it turns out that yæ/(1 + Ja) is approximately the right fraction to use in plate counting, 
leaving the fraction 1/(1 + Ja) for doses. 

(3) The local nature of the solution is not a serious handicap to applications. 

As in the example treated in $3, the asymptotic variance undergoes a relatively small 
change even when the fraction devoted to doses is changed by a considerable amount. This 
can be seen by referring to Figs. 1 and 2 and Table 1. Table 1 gives /° as a function of 4 
and A for s = 30. In addition, there is presented 2, 7 and o*(&), where f < f <F. 

В and are fractions devoted to doses which yield designs which are 80% efficient 
(efficiency measured in terms of asymptotic variance). a*(4) is the asymptotic variance 
using f°. The relatively large spread between f and is an indication of the wide applic- 
ability of the local solution. Е 


FRED C. ANDREWS AND HERMAN CHERNOFF 311 


In Fig. 1, 2, 29, В" and Ê are drawn as functions of A for æ = 0-10 and s = 30. Similarly, 


in Fig. 2, 2, 2°, /* and д are obtai if ae 
the approximation of £* to f. ога 04. These figures indicate the goodness of 


Fig. 2. Comparison of f values as a function of A. 


(4) The solution provides a very simple computational method of estimating о. 
If all the test animals are given the same dose D, the proportion 2 of test animals which 
do not react is the maximum-likelihood estimate of e-*P. Hence, the estimate of wis given by 


1 r 
& = log =— 2 30 ор. 


Biom. 42 


A large-sample bioassay design 


312 


0081 


000°T 


(6666020) 2-01 
¥0-0 20-0 20:0 


(816906:0) «-OT 
10-0 90-0 20:0 
(98610-0) -OI 
STO 80-0 900 


(6164060) г-0Т 
91:0 01-0 90:0 


(091804-0)-ОТ 
61-0 ZTO 100 


(90603-0) г-0т 
96:0 9T-0° 60-0 


(OSTTTS-0) «-OT 
6-0 72:0 £r0 


(898088-0) 2-01 
£L0 SFO 6:0 


(91192-0) «-0T 
$80 S90 88-0 


(690167-0) z-0T 
98:0 69-0 87-0 


Ye 060 


"р jo eouwumá omojdur£sv umumurur'(3),0 “шц Louay % 0g 1oddn ‘g 


‘g jo enpea perdo ‘og “пиц Аойәтоә %08 Imor ‘fg Moy y 


(8829ТТ-0) г-0Т | (1689190) г-0Т | (O0TOGE-0) «-OT 
90:0 £00 20-0 | 10-0 90-0 £0-0 | 60-0 90-0 50-0 
(SES9TT-0) г-0т | (T9SLTS-0)s-OT | (PL80EE-0) 2-OT 
01-0 90-0 70-0 | ST-0 OTO 90-0 | 6I-0 ZI-0 10-0 
(LEL9TT-0)s-OT | (9898TS-0) s-OT | (98026-0) e-OT 
11-0 TEO 90-0 | 92-0 91-0 60:0 | 38-0 0-0 TEO 
(TL89TT-0) s-OT | (7996180) -OT | (882266:0) e-OT 
02:0 ETO 10-0 | 18-0 61-0 TTO | 68-0 75:0 81-0 
(OSTLIT-0) г-0Т | (162190) e-OT | (1668-0) s-OT 
96-0 91-0 60-0 | 6&0 FZ-0 ETO | 8-0 08:0 91-0 
(GILLTL-0) s-OT | (229790) s-OT | (66P9EE-0) e-OT 
PEO 13-0 ZTO | 290 28-0 21-0 | #90 68-0 18:0 
(0216ТТ-0) г-0Т | (69989-0) s-OT | (6227-0) e-OT 
I€0 26-0 2:0 | SLO LPO 6-0 | 1&0 190 160 
(6602070) г-0Т | (10809-0) e-OT | (2922170) -0т 
98:0 1€0 26:0 | 06-0 TLO 27-0 | 160 SLO 190 
(60297-0) г-0Т | (06006-0) г-0Т | (ZTFS9-0) e-OT 
88-0 02-0 77-0 | 06-0 SLO 080 | 160 21-0 260 
(9сттее-0) 2-0Т | (РТРТ6Т-0) s-OT | (TOTSFT-0) -0Т 
18-0 61-0 8Р0 | 06-0 91-0 290 | 16-0 81-0 Sco 
STO ого 80-0 


(069¢8T-0) -0T 
ET-0 80-0 90-0 


(9S1981-0) s-OT 
9&0 9T-0 60-0 


(89698T-0) ;-0T 
87-0 98-0 ST-0 


(9TFL8T-0) e-OT 
660 28:0 LTO 


(1688-0) г-0т 
#90 6&0 22-0 


(9FE06T-0) s-0T 
€8-0 280 85:0 


(90667-0) s-OT 
26:0 01-0 OF-0 


(TL9¢93-0) г-0т 
86:0 61-0 290 


(6980РР-0) г-0Т 
86:0 08:0 990 


(сєс66°0) -0T 
56:0 08:0 890 


90:0 


(Z9Z8ES-0) -0T 
20-0 10:0 100 


(15788-0) +-0T 
70:0 20:0 20-0 


(99228-0) -0T 
61-0 ZTO 100 


(6911280) »-0т 
62:0 7:0 ETO 


(LEFZES-0) »-0T 
£90 OF-0 22:0 


(LGLGES-0) -0t 
LLO 1#0 970 


(05078-0) +-0т 
06-0 890 28:0 


(є9Р998-0) »-0T 
760 21-0 0r-0 


(985616-0) »-01 
©6-0 08-0 15-0 


(Lee! T-0) s-0T 
*60 £80 8¢-0 


(998608-0) -0T 
70-0 20-0 20-0 


(022902-0) »-0т 
10-0 €0-0 £00 


(17902-0) »-0т 
68:0 7:0 81-0 


(987108-0) +01 
LLO SF0 96-0 


(oeorre-0) -or 
96-0 9-0 &Р0 


(69FL1-0) -OL 
160 18-0 8Р0 


(SFGTEZ-0) »-0т 
16-0 98:0 FS-0 


(S6L¥9G-0) -0T 
96-0 98-0 690 


(STEZFE-0) »-0T 
96:0 18-0 29-0 


(98821-0) =-01 | (919-0),-0т 000'001 


STO ог-0 900 


(#1861-0)-ОТ 
16:0 61-0 IL-0 


(өстет-0) s-OT 
66-0 18-0 08-0 


(99691-0) 5-01 
86-0 £60 19-0 


(20072-0) «-0T 
86:0 £6-0 02-0 


(26922-0) =-0т 
86-0 £6-0 TL-0 


(00888-0) «-0T 
86-0 £60 22-0 


(212-0) .-0r 
86:0 £60 £10 


(SBLT9-0) s-OT 
86-0 £60 £L-0 


| 


81-0 8Р0 92-0 


(вт9-0),-0т | 0000S 
— 260 190 


(0921-0) »-0т | 000'0T 
— 160 120 


(9062-0) »-0т | 000° 
— 160 82-0 


(0212-0) ,-OT | 000% 
— 160 8L0 


(8299-0) OL | 006% 
— 160 620 


(core-0),-or | 000% 


— 160 610 
(922-0) »-0т | 00¢‘T 
— 160 610 
000“ 
00€ 
0% 


I 
1000 
| 2 


spun 189] 06 = $ Јо asmo эү] sof o fo ooumiava 0100) d fis штшлитш oif] рит sana g snowy "1 әде, 


Frep C. ANDREWs AND HERMAN CHERNOFF 313 
Should a more complicated design be used, the calculation of the maximum-likelihood 
estimates would be much more difficult and involved. (It must be pointed out that if there 
are serious doubts about the applicability of the exponential model, the biologist cannot 
rely on a design giving each test animal the same nominal dose.) 


The computations and figures in this paper were prepared under the supervision of 
Aloise Askin and Gladys Garabedian. 


MATHEMATICAL APPENDIX 
Allocation of doses 
Denote by f the fraction of the bacterial material reserved for the dosage part of the 
experiment. Let f; be the proportion administered to the ith animal. Then 


8 
B= Xf, 
ied 
and the nominal dose received by ith animal is f,A. 


Since our design is to be locally optimal, we shall define the optimal design as that one 
which provides the maximum information relative to x (the information matrix defined 


by Fisher is given by 
fa (2509 2) ы sere 0) log f (x, 2) 
i 00,00; 00; 00; i 
where f is the probability density and @ = (a, A) is the vector parameter to be estimated). 
Since the random variable Z (the number of bacterial colonies developing in a culture 
medium) observed during the plate count has a Poisson law of probability with expectation 


(1— £) A, it follows that 
(ert 
(9. e £ fin 
“| t= on (6) 
rs l ы) w б 3 a 1-р; 
where p; = efi’, , М 

For the moment let us assume the following properties of the function 


узе“ (т 
gu = 7 ) 

for positive w. 
LEMMA 1. (а) The equation g'(v) = 0 has a unique positive root at ux 1-6 (the corre- 


i ei i 0<u<3-08. 
sponding value of e-" is 0-8), and (b), g(w) is concave for dod 
It follows immediately from (a) and (1) that for an optimal design all f;«A should be less 


than 1-6, From (b) it follows that if 0 < u, < 3 for all i 
nat (8) 
i 2- А 
д; P D A à glu) 
and therefore we have: 
THEOREM 1. For the model considered here, all the animals should be given the same dose. 
It remains only to prove the lemma and to determine the optimal #. 
Proof of Lemma. Part (a): , Ques uer (9) 
> g'(u) = Tew (1-e) 


21-2 
g 


314 A large-sample bioassay design 
It suffices to show that the equation 
(l-e~)—4u =0, (10) 


has a unique positive root. But this follows from the fact that the expression on the left is 
concave and vanishes at u = 0. 
Part (b): 
g'(u) = (e" — 1)- [e*" (u3 — 4и + 2) + €" (u? + 4u — 4) + 2]. 


Expanding е?“ and e" in a Taylor series we get 


А К. nee 
ge) = (6-1) BS, 


with а, = j(j—9) 2/73 + 4(2/-2 — 1) +j?, so thata; < Oifj = 2,3, ..., Sanda, > Oifj = 9,10,.... 
From this it follows that g"(w) has only one positive root (which is computed to be approxi- 
mately 3:086) and (b) follows. 

Suppose now that each animal is given the nominal dose /A/s. The maximum -likelihood 
estimates of A and a are 


^ Z 
б=т (11) 
“с. Кк: ыг; 4 
апа & =—~ logp =—- log p, (12) 
Й Ep, B gp 


where Z is the number of colonies developed during the plate count and is the proportion 
of the animals which did not respond. 
Studying & directly or applying (6) it follows that the asymptotic variance of & is 


a? 8 (1—р) 
MA PE pc? 


The optimal value of f can be tabulated as a function of « and A. Hence designs of this 
type can only be hoped to have local optimal properties. However, in many situations 
some a priori knowledge of À, « is available which should enable one to get a fairly efficient 
design. 

Examining c*(&) as a function of Û we see that for small values of «A/s 


aves 8 [fA Cv cat Ah EN 
оа rernm x rca 


оча) = = йв, (13) 


which is a minimum for # = 1/(1+ Ja). 
For large values of A there should be enough material to give each animal a dose of approxi- 


mately EDs, which is the optimal single dose as can be determined again from the in- 
formation function. If this is the case, setting 


faa|s = 1-6 


will yield an approximately optimal value for f when А is large. For an overall approxima- 
tion to optimal Û we propose setting 


wipe e 1 (1:6)s 
Fe min |g C2]. 


› 


FRED C. ANDREWS AND HERMAN CHERNOFF 315 


Table 1 gives, for various values of æ, A, the actual value of 2 minimizing o*(@) for s = 30, 
the upper and lower limits of the values of 2 giving at least 80 % efficiency, and the minimum 
value of °(2). If Table 1is to be used for values of sother than s = 30 the conversion formula 


(â | s, A, a, f) oe ус | ys, yA, a, В) (14) 


is useful, where c*(& | s, A, а, £) is the asymptotic variance of 2 when s, A, x, J are given. 
For example, if s = 60, A = 5000, æ = 0-04 and the minimum asymptotic variance and 
optimal Û are desired, we see from Table 1 that for s = 30, A = 2500, a = 0-04, the optimal 
fis f = 0-47. From equation (14) we find that 


o*(@|s = 60, А= 5000, ж= 0-4, #=0-47) 
= jo*(& | s 30, A=2500, a=0-4, 8= 0-47) 
= 10-4(0-835757). 


The optimal value of f is 0-47 and the upper and lower 80 % efficiency limits on В are 0-77 
and 0-26, respectively. 

If / = В* is used as an approximation to the optimal value of /, we see that a design of at 
least 80% efficiency for all а, А and an extremely efficient design for small А and large A 
result as can be seen in two instances in Figs. 1 and 2. 


REFERENCES 


CLIFTON, C. Е. (1950). Introduction to the Bacteria. New York: McGraw Hill Book Co. ] 

COCHRAN, W. С. (1950). Estimation of bacterial densities by means of the “most probable number’. 
Biometrics, 6, 105. 

Drurrr, Н. A. (1952). Bacterial invasion. Nature, Lond., 170, 288. — ] d 

GOLDBERG, L. J. & Warkrss, Н. M. S. (1952). A mathematical expression regarding mortality in the 


albino mouse to respiratory exposure with pathogenic micro-organisms. Bact. Proc. Amer. Soc. 


Bact. p. 74. З ч r 
Наку, K D. C. (1953). Estimation of the dosage mortality relationship when the dose is subject to 


error. Unpublished Ph.D. dissertation, Stanford University Library. Я р 1 
Perro, 8. (1953). A dose-response equation for the invasion of micro-organisms. Biometrics, 9, 320. 


WORCESTER, J. (1954). How many organisms? Biometrics, 10, 227. 


[ 316 ] 


AN EXACT TEST FOR CORRELATION BETWEEN TIME SERIES 


By E. J. HANNAN 
Australian National University, Canberra, А.С.Т. 


1. INTRODUCTION 


In a previous paper, Hannan (1955) in the process of obtaining an exact test for the serial 
correlation in the residuals from a regression, an estimator of the regression coefficient was 
also derived whose distributional properties, when the residual is a simple Gaussian Markoff 
process, are the same as those of the estimates of the regression coefficient in the classic 
cases. It was then shown that when the regressor is also generated by a simple Markoff 
process the regression coefficient will, for certain values of the serial correlation of the 
residual and the regressor, be a more efficient estimator than that obtained by straight- 
forward least squares. It follows that in these cases the test of significance of the regression 
coefficient provides a test of the correlation between the two series which is asymptotically 
more powerful than the approximate test based on the sample correlation coefficient. 

Alternative approximate tests for correlation between two series, in the form of partial 
correlations with the effects of lagged values of one or both variates removed, were pro- 
posed by Quenouille (1949). In the next section of this paper the efficiencies of Quenouille’s 
tests are compared with those of the two above-mentioned tests when the residuals from the 
regression come from a simple Gaussian Markoff process. 

In later sections the efficiencies of these tests are considered in other cases, some of which, 
from the point of view of a test for correlation, appear more interesting. 


2. THE EXACT TEST WHEN THE RESIDUALS FROM THE REGRESSION COME FROM A SIMPLE 
MARKOFF PROCESS INDEPENDENT OF THE REGRESSOR PROCESS 


We will consider the regression 
Y= afe (t=1,...,n), (1) 


where (1) e = p,6_,+7, and 7, is N{0, v, (1 — p2)!), (2) є, is independent of æ, for all t апаз. 
In a previous paper (Hannan, 1955) it was shown that an estimator of В could be obtained 
by considering the regression 


T 2p p 
Ya = (1 zd 2) tig (Yara Varii) + ғы 


p 
ға П pP uci аын) T & (t = 1, ө [4(n 2 1)]). 

Неге & comes from a process of independent random variates with zero mean and 
variance o{(1—pj)/(1 +p) and [3(n — 1)] is the greatest integer less than or equal to {(2— 1). 
The estimator, b, of f obtained from the sample regression coefficient of zy will have the 
usua] properties of least-squares estimates on the normal case so that the corresponding 


Е. J. HANNAN 317 


partial correlation between yy and £y will provide an exact test of the hypothesis that the 
two series are uncorrelated. 

When z, also comes from a simple Markoff process, with parameter р,, (| p, | < 1), the 
variance of the estimator, b,, was shown to be asymptotically 


2a$(1 — pi) (1 +73) 
no + pi) (1 — pi)" 
Here с? is the variance of z;. 
Itis well known (Wold, 1953, p. 211) that the variance of the straightforward least-squares 
estimate, б», of 2 in the regression (1) is, asymptotically, 


epp) 
no3(1 — рур») 


Alternative tests for correlation between two serially correlated series were proposed by 
Quenouille (1949). In the сазе of correlation between two Markoff processes he recom- 
mended the use of 

(азу | 1),  r(xaya|yi) or (ssl 2141) 


where т(жуу» | 2), for example, is the partial correlation between , and y, with the effects 
of z, removed. Asymptotically such statistics will have variances 27, when there is no 
correlation between the two series, since at least one of the series of residuals which are 
being correlated will approach independence. Quenouille showed, by a sampling experi- 
ment, that when the two series are uncorrelated the variances are approximately n^, 
even for small samples, and the bias is small. The best statistic, on these grounds, appeared 
to be r(zyys |2,01). However, the efficiency of the statistics, as test statistics, depends not 
only on their variance on the null hypothesis but also on their distribution when the null 
hypothesis is not true. In particular, their expected values are relevant. A criterion of the 
asymptotic efficiency of a test based on a statistic f, of an hypothesis specified by a para- 
meter value 0, is provided by the limit as т > со of the quantity (Stuart, 1954; Mood, 1954) 


CEE. R 
var (t) Joo, 


The limit of the ratio of these quantities for two statistics f and t, has been called (Pitman, 
1948) the asymptotic relative efficiency of the tests and may be denoted by E(t, ta | 0) 
(where the expression (2) for t is in the numerator). | 

As mentioned, Quenouille was considering the correlation between two Markoff processes 
when he proposed his statistics. In the present case yj will not be Markovian if p, 3: p, and 
£+0. Tt is, however, of interest to examine Quenouille's statistic in the present case also. 
(It is unlikely that, in practice, partial correlations of higher order than r(z,y, | у) will 


be used.) 
We have | 9 


Pala 
i ocesses are independent). Here y and x are vectors of 
when Pis iato (ip LN ix of the residuals є,. The differentiation 


the n elements y, and 2, and Г, is the covariance matrix diffe 
under the i бип is justified by the uniform convergence of the resulting integral. 


@ (00227 | а) = fim eov (rats [а1),у'Ггіх}, 


318 An exact test for correlation between time series 


The limit of the covariance can be evaluated by straightforward methods (making use of 
an obvious extension of the theorem quoted in Cramer, 1946, p. 353) and gives 


ie (С, ey(1 — pii 
Jim анараа) =. 
; 1—p2\t 
Similarly ha КОШТ | aw). = = G =| : 


Since these two statistics have the same limiting variance on the null hypothesis 
(2:502 | 2,91) is always asymptotically the more efficient. 


We now have (1+ 393) (1+ pt) (1—p3) 
Movs A=) = атр) 00+, 


Tep 
Е{т(ж»у» | 2134), ba | P=0} = САРТР, 


1+ 
E(r(zsya| 2191), b,| B=0} = zh a ; 


The first of these ratios was shown for certain values of p, and p, in Table 3 in Hannan 
(1955).* The second ratio is shown for the same values of p, and p, in Table 1. 


Table 1. E(r(zays | жуу), 63 | 2=0) 


0:4 0-6 0:8 


1:19 1-56 2-78 
1:34 1:91 3-68 
1:38 2-14 4-53 
1-24 2-13 5:06 
0:83 1-60 456 


It is evident that, while b, will be asymptotically more powerful than b, when p, is suffi- 
ciently high it is always asymptotically less efficient than (x, y, | Жу). It has the advantage, 
however, of giving an exact test. 

a 

It is easily shown that when a, is Markovian — @ C 18 5 tends to 

p=0 


{пе (1+ pi—2p,p2)}/{o7(1 —p?)} 


as n increases; while the quantities (лы), апа -« 


2 
ae 2) tend to zero. 
. n : Rea’ 

Here L is the likelihood function. It follows (Whittle, 1953) that the variance of the 
maximum-likelihood estimator of £ is, asymptotically, when 8=0 


= _ 
noil - p— 2?рур»)` 
None of the statistics considered in this section are, therefore, asymptotically fully 


efficient in general, though r(z,y, | x,y,) is asymptotically fully efficient when рі = рг. 


* There p and p, were used in place of p, and р respectively. 


E. J. HANNAN 319 


One can, of course, suggest tests which are asymptotically fully efficient and which do not 
require the solution of non-linear systems of equations as the maximum-likelihood estimator 
does. Such a test will be obtained by using a consistent estimator of the serial correlation 
of the residuals to transform the regression equation to a form with an approximately 
random residual (Cochrane & Orcutt, 1949). Apart from the fact that the criterion of 
asymptotic relative efficiency is only suggestive of the relative power of tests in small 
samples it has an added deficiency when applied to tests which are not exact, for it does not 
take into account the effect of the deviation of the significance point used from the true 
value appropriate to the level of significance required. In the case of Quenouille's tests his 
sampling experiments seem to show that on the null hypothesis the mean and variance of 
his statistics in small samples are near to their theoretical values so that the significance 
point used will be approximately correct. 

Example 1. Two series of seventy observations were chosen from series 7 in Kendall 
(1949), the last term in one series being sufficiently far from the first in the next to make the 
two series effectively independent. If the two series are represented Бу z, and 2, a third 
series which was formed can be represented by 


уге 0-82, FR. 


The commencing term in the series z, is no. 191 and in 2, no. 316. 


Table 2 
Statistic Sample value | Population value 
b, 0-994 0-8 tay = 521** 
b, 0-417 0-8 T = A 
(vala | 23) 0:316 0-329 Ter E : ia 
(vaya | 211) 0-684 0-625 ть = 0:684 


** Highly significant. 


The three series now are Gaussian Markoff processes with first serial correlation 
equalling 0-9. 
The details of the various tests are given in Table 2. 3 
Here ty indicates Student's t with 30 degrees of freedom, while 7, 


coefficient; with m degrees of freedom. 1 
The observed first serial correlation coefficients were 


Тыл = 0°755, My = 0-739. 
were calculated from the formula 


indicates a correlation 


The degrees of freedom appropriate to 7o 


1 — (0-755) (0-739) _ 99 
£ = —MÓÓ an ~ . 
т = 107 (0-755) (0-739) 


The statistic ray is not significant at the 5 % point. 


320 An exact test for correlation between time series 


3. AN EXACT TEST FOR CORRELATION BETWEEN two 
SIMPLE MARKOFF PROCESSES 


Let (Y= ka) = Pia Ha) +6 
(2,— a) = Pa( %1 — Из) + Mp 


be two simple Markoff processes with normally distributed disturbances having zero means 
and variances o$(1— p) and e3(1 — p3). 

The correlation between these processes can be prescribed in terms of the correlation 
properties of the residuals. Since these two residuals come from processes of independent 
random variates this will be achieved by saying for what lag, if any, the two residuals are 
correlated and what these correlations are. 

We will consider the case where the only non-zero correlation is between e, and Mp and 
the series have been so lagged relative to each other that m is zero. 

If the correlation between e, and 7, is p, then the correlation between x, and y, is 


(>з: О-о) 
(1—р,0») 


t<s: = api. 


pes = арг", 


The two series y, and z, will be jointly normally distributed for every t so that 


(=) = a (ж — Jta) + & 
2 


where ¢, is independent of x, and has mean zero and variance ©ї(1— о). 

Then a simple calculation shows that &(&б,_„) = o3(1—«2)pj. It follows, since & is 
normally distributed, that it is generated by a simple Gaussian Markoff process with para- 
meter p, (Doob, 1953, p. 233). The disturbance 0, in the process б = P1G-1+9, is 


c. c 
0, = ay (o-Ps) rE Ha) ta aTi 


Tf p, 4 py and p+0, б, depends upon z, , and £, is not independent of x, for t+ s. The б, are 
of course independent. 

When р; = р the present case is equivalent to that considered in the previous 
section. When p, p, the exact test of significance of the correlation between the two 
series there given is still an exact test, but the conclusions as to the power of this 
test and the others there considered, when v, also comes from a Markoff process, do not 
ae for when p+0 the condition (2) upon which the derivation depended does not 

old. 

The conditional distribution of the y, (t = 1, ... [Mn — 1)]), for fixed a, (t = 1, ... n) and 
Voi (t = 0, ..., [}(n— 1)]) is the product of the conditional distributions of each yy for fixed 
Уи-л› Yasi Ty 1, Cy and ш ‚ү. This follows from the fact that for fixed Your, ANd ж, the Ya 
апа Xy are serially independent (Ogawara, 1951). This conditional distribution can be found 

L3 


3 


j 


E. J. HANNAN 321 
from the correlation matrix of the variates ya, 4, Vay: Tg 4i Top 20:1, Yy (taken in that order), 


which is 5 A B 
BH]! 


1 pi a ар, opi 
ч pp 1 одар a 
А= а api 1 „Рз pi , 
aps ар Pa 1 P 
ap X Pj p. l 


B'- (p р ар, а ар. 
Then the variate y, is conditionally distributed with mean т,А-1В7, and variance 
c?{1— B'A-1B} (Rao, 1952, p. 54). 


g -Waa Уна Tua Ty Tayı 
di gL 9а 7:02 


The vector (А-1В)' is easily seen to be 


Here 


TA –раа0- рр) ali) -aa 
1+ pî Pr Pi i-o Tp x ; 
while ci(1- B'A3Bj = AEP. 


For p, = p, these formulae are equivalent to those given in the previous section, when it is 
remembered that there the variance of the residual e, was v1, while here the variance of the 
corresponding residual is a3(1 — p?). 
For p, + рз two things are noticeable: 
(1) The regression coefficient of 23 1 
symmetry present when p, = p; is now lost. 


is not equal to that for £a, so that some of the 
This loss of symmetry is also evident from the 


fact that Ф ра) (Vrs LO} 6 ss — 2) (1 — It) 

when p.p, and 4+0 

and (y Hr) (аз — Из) [959-3 =0 8<0, 
+0 8>0. 


(2) The coefficient of %y is no longer equal to ату/т but instead to 


т | 1—pipà |. 
* e. p) 1-Р) 


The asymptotic variance of the estimator, бу, of this last coefficient is, for p — 0, 


202(1 — )?م‎ (1+P3) 
по(1+р®(1—/®” 


E] 2 
so that the quantity lim. (var oo] ls at m 
2 
n(1 + p103) 


322 An exact test for correlation between time series 


For the ordinary correlation coefficient, r, between z, and y, the corresponding quantity 
is (Bartlett, 1935) 1—pipi 


n(1— pi) (1 — 93) 


The ratio of the second of these quantities to the first (the asymptotic relative efficiency 


of the two tests) is A EL 
Жу |p=0) = 4 — 0169 0t o 


2 (1—1)(1-р8) 
which is evaluated for certain values of p, and p, in Table 3. For E(5,, | p = 0)?» 1 it may be 
inferred that, asymptotically at least, the test based on 5, is superior to that based on r. 


Table 3. E(b,,r|p—0) 


—0-4 —0.2 0 0-2 0-4 0-6 0-8 


0-51 0-50 0-50 0-50 0-51 0-57 0-85 
0-43 0-46 0-50 0-54 0-60 0-71 VTR 
0-36 0-43 0-51 0-60 0-69 0-85 1:36 
0-32 0-44 0-57 0-71 0-85 1-06 1-64 
0:36 0:58 0:85 111 1:36 1:64 2.28 


As in $2 we have 


"uit Е 
да 3p 4 са» | 2) s lim cov (r(zsy; |21), y Гү ! T, Py! xj, 


p= 


the differentiation having been carried out under the integral sign and y now being in- 


dependent of x. Here Г, is the covariance matrix of the уь Г, is the covariance matrix of 
the x, and 


n 5 (0. — n) а) 


The limit of the covariance сап be evaluated in the same manner as in § 2 and gives 


4 д 

Jim [ён |а) = a-o, 
4 д 

lim favs | n) = (1—08), 


«x fd 
Jim [7 Фа» un) = 1. 


Clearly once more r(z, y, | £141) provides the most efficient test statistic. 


It can be shown that, for p = 0, the quantities —& et are asymptotically zero. 
j/p-0 


Here L is the likelihood function and the 0, (j — 1,2,3,4) are the remaining parameters 


1 : o? 
in that function. At p — 0, -« к 5 = n. It therefore appears (Whittle, 1953) that the 


asymptotic variance of the maximum-likelihood estimator of p is n7! when p is zero. Since 


e. 


E. J. HANNAN 323 


this estimator will be consistent both r and b, are asymptotically inefficient as test statistics 
when p, and p, are not zero, while r(z,y, | x,y) is asymptotically fully efficient. 

When p, and p, are high and of the same sign b, will give an exact test of the null hypo- 
thesis which is more efficient than the test based on r. However, the test based on 0, is always 
much less efficient than that based on r(z,y, | 2, y). 

The computational burden of the exact test can of course be reduced by omitting one 
or both of z, , and zy, у from the regression and the test will still be exact. It can be shown, 
however, that the test will be asymptotically less efficient. Another alternative is to use 
(Xy. + 2:1) in place of ty, and хы; ,. It can then be shown that the asymptotic efficiency 
of the test based on the coefficient of xy is unchanged. In finite samples, however, the power 
of the test may be adversely affected. 

The following example illustrates the use of some of the tests mentioned in this section. 

Example 2. In Moran (1952) the correlations between the records of the numbers of certain 
game birds shot over a period of years were considered. The series for caper and ptarmigan 
are considered in this example. 

The application of the test given in Quenouille (1947a) shows that both series are well 
represented by a simple Markoff process. 

The results of the various tests discussed above are shown in Table 4. Here r,, indicates 
a correlation coefficient with m degrees of freedom. 


Table 4 


Statistic Sample value Test statistic 
T 0-417 ты = 0:417* 
b, 0-444 ty, = 2:484* 
0-387 Te = 0:387*** 


T(*aya| i91) 


* Significant at 5% point. *** Significant at 0-1% point. 


The two observed serial correlations were 0:592 and 0:417 for caper and ptarmigan 
respectively, and these were adjusted up to 0-65 and 0:55 by Moran to allow for bias. Using 
these last values it appears from Table 3 that r and b, should give tests of roughly the same 
efficiency while (жь у» | 2121) should be approximately twice as efficient as either. 


4. THE CASE WHERE THE REGRESSOR PROCESS IS A SECOND-ORDER 


AUTOREGRESSIVE PROCESS 

The exact tests for correlation between two series which have been presented in this paper 
remain exact provided that the residual from the regression of one variate on the m is 
a Markoff process. The efficiency of the tests has been examined in the following cases: 

(1) The regressor process is à Markoff process wholly independent of the Mens In 
particular this covers the case where the regressand is also à Markoff process with the same 

гезво: he residual. 

"d ты ENG enn are both Markovian. When the two para- 


2) " regress 'Ocesses 
О еси ake cular case mentioned under (1) above. 


meters are equal this case is equivalent to the parti 3 А 
If neither a the two processes being correlated can reasonably be said to be Markovian 


324 An exact test for correlation between time series 


then, strictly, the exact testa presented here are not applicable, for the residual can then be 
Markovian only if there is in truth a relation between the series. (One might apply the 
as an approximation to an exact test of course.) 

A case which may arise in practice is that in which one process appears Markovian 
the other is an autoregressive process of higher order. The more interesting alte : 
hypothesis then appears to be one in which the correlation between the series arises from a 
correlation between the shocks of which they are linearly composed. 


и (и) = руи 7 13) +€, 
апа (x, — Ma)  a(2,., — ну) + b(, — Ha) = M 
be two processes where, 


(1) | pi | <1 and all roots of 2® + az +Û = 0 are less than unity in absolute value, 
(2) 6, and у, are jointly normally distributed with variances 
o}(1—p}) and oe$(1—5)(1--5—a?*) (14-5) 
and correlation p. 
Table 5. E(b,,r|p=0) 


p17 07 0:6 


Then by straightforward methods it can be shown that 


E(b,,r|p=0) = (1+P1P2—6)?{1 + pip, — (рї + руру)}{1 —P1Pa Брг — 1) 
201—1) (1 — på) (1 —5) {1 + p$ —&(1— p3)} 
Here p, is the first serial correlation of the z, process. 

At b = 0 this is equal to the quantity tabulated above. The ratio decreases with b, for 
most values of pı and ру, at least in the neighbourhood of b = 0. For sufficiently large values 
of p, and p, it will be greater than unity. The ratio is tabulated in Table 5 for certain values 
of ру, Pa and b. 

It is clear that the statistic r(x3y, | 2,2.) will now give a test which is asymptotically 


fully efficient. The asymptotic efficiency of the statistic т(ж»у» | жуу) can be fairly easily 
evaluated and proves to be 


Ваа | 211), (ess | say) | p = 0) = (1—0). 
The use of (жуу, | z,;) will then result in little loss of information unless b is fairly large. 
Finally - 
By r(zsys |) = (1 +p? ee. 
* pi) (1—6){1—b + p(1-- 0)) 


E. J. Наххах 325 

This is less than unity for most of the range of values of the parameters involved though 
it becomes infinite ав b tends to unity. 

Example 3. The data for grouse and blackgame from Moran (1952) may be used for 
illustration. The blackgame series appears to be Markovian but the first partial serial 
correlation for the grouse series is 0-34. The first serial correlation for grouse is 0-64 and for 
blackgame 0-43. Using b = 0-3, p, = 0-7 and p, = 05 we obtain 

E(b,.7|p=0) = 002, Bib, riz, y, | жуу, |р = 0) = O47, 
The tests are shown їп Table 6. 


5. GENERALIZATION 


The preceding exact tests can of course be generalized to multiple correlation and analogous 
results can be expected. i 

A further generalization of both the simple = — PO — 
when one process is an autoregressive process of higher process 
order A however only n(h+ 1)-* observations will be used in the correlation (where n is the 
number of observations available), the remainder being used to reduce the series to in- 
dependence in time. The serial correlations would have to be very high indeed before the 


04 : i i and yy when the effects of 
the test statistic, b,, is the partial correlation between ту " _ te 
p asymptotic efficiency of this statistic 
(Jua ursa) 1 and ys have been removed. The А po between i 
is compared with that of the ordinary у алг зүч кзг А Һе cecidi 
the statistic („у | 2,1) suggested by Queno YU . 
(a) The residual process from the regression of y, on 2; is independent of the 2, process 
and comes from a Gaussian Markoff process. : 
(b) The two series a, and y, are Markovian and are correlated through a correlation between 
the (Gaussian) errors in the two processes. i 
(c) Asin (b) but with a, generated by а second asymptotically iners aitia a MR 
While in all three cases the statistic b leads 0 A E ion of y, on z the statistic 
т for high serial correlation of the residual process In y 


326 An exact test for correlation between time series 


T(2Ya | 2101) is always asymptotically more efficient than b, with the exception of some 
cases, under (c), where the first partial correlation of the z, process is high and positive. 


I should like to thank a referee of this paper for pointing out the importance of 
Quenouille's statistic. 


REFERENCES 


BARTLETT, М. 8. (1935). J. R. Statist. Soc. 98, 536. 

COCHRANE, D. & Oncurr, G. Н. (1949). J. Amer. Statist. Ass. 44, 32. 

Cramer, Н. (1946). Mathematical Methods of Statistics. Princeton University Press. 

Doos, J. L. (1953). Stochastic Processes. New York: John Wiley and Sons. 

HANNAN, E. J. (1955). Biometrika, 42, 133. 

KENDALL, M. С. (1949). Biometrika, 36, 267. 

Моор, A. M. (1954). Ann. Math. Statist. 25, 514. 

Moran, P. A. P. (1952). J. Animal Ecol. 21, 154. 

Ocawara, M. (1951). Ann. Math. Statist. 22, 115. 

PITMAN, E. J. С. (1948). Lecture notes on non-parametric inference. University of North Carolina. 
QUENOUILLE, M. Н. (1947a). J. R. Statist. Soc. 60, 123. 

QUENOUILLE, M. Н. (19475). Biometrika, 34, 365. 

QUENOUILLE, M. Н. (1949). J. R. Statist. Soc. B, 11, 68. 

Rao, С. R. (1952). Advanced Statistical Methods in Biometric Research. New York: John Wiley and Sons. 
STUART, А. (1954). J. Amer. Statist. Ass. 49, 147. 

WHITTLE, Р. (1953). J. R. Statist. Soc. B, 15, 125. 

Worn, Н. (1953). Demand Analysis. New York: John Wiley and Sons. 


[ 327 ] 


SERIAL CORRELATION IN REGRESSION ANALYSIS. It 


Bx G. S. WATSON 
Australian National University, Canberra, A.C.T. 


l. INTRODUCTION 


The procedure for linear regression analysis when the errors have a multivariate normal 
joint distribution with covariance matrix сЗа, where а is a given matrix are well known 
(see, for example, Aitken, 1934-5). However, in many practical applications, the matrix « 
isnot completely known. Inthe analysis of ordered observations, the errors are often serially 
correlated. Evenif theerrorsareassumed to be generated bya stationary stochastic process of 
the moving-average or autoregressive type of lower order, the actual selection of the process 
and its parameters is a difficult matter, requiring samples of a size not often available. It is 
therefore certain that regression analyses will often be made using the wrong matrix a. 

Recently Hannan (1955) has given a method which, while not fully efficient, leads to 
exact tests when the error process is Markov with unknown p. This is an advance on previous 
writers (e.g. Wold, 1949; Cochrane & Orcutt, 1949), who aimed at the estimation of the 
serial correlations of the errors and hence the determination of а. But the basic difficulty 
still remains—the error process may not be Markov. Furthermore, while the difficulties 
are well known, the orders of magnitudes of the effects of using the wrong matrix a have 
received little precise study except for the case of unequal variances in the analysis of 
variance (see, for example, Вох, 1954). 

For these reasons in Part I of this paper an examination will be made of the performance 
of a regression analysis based on the assumption that the error covariance matrix is с?ү 
when it is, in fact, oa. The methods used have their origin in the papers of Durbin & 
Watson (1950, 1951). When ¥ ¥ a the regression vectors play an important role in deter- 
mining the efficiency of the regression coefficient estimates and the true significance levels of 
significance tests. The procedure adopted is to find, for fixed а and ү, the regression veotors 
which would make the analysis as ‘bad’ as possible; this leads to inequalities on the bias in 
the estimates of variance of the regression coefficients, on the efficiency of the estimates of 
the regression coefficients and on the significance points of the various /- and F-tests. 

In Part II, these inequalities will be applied to cases where с?а and о?у are the covariance 
matrices of first- and second-order autoregressive and moving-average Processes. By using 
approximations to these matrices, it is possible to get simple algebraic and numerical results. 

We consider the linear regression of a dependent variable y on Ё (linearly independent) 
regression variables 25,25, ...,,- The model for a sample of N observations is, in matrix 


form, "| X B aes * А E y | aa 
YN. UN ce VIN. P. Un 


T This work was carried out while the writer was & Resea reh Officer in the Department id amin 
Economics, University of Cambridge. The results of this paper and its sequel Mo mem 2 ae 
Conference of the Econometric Society, Louvain, Belgium in September 1951. It iz eem ni 
thesis which was issued by the Institute of Statistics, University of North Carolina, eograp! 


Series, no. 49. 


Biom. 42 
22 


328 Serial correlation in regression analysis. I 


or у = [X,,...,X,JB+u, 
or y-Xp-u, 
where the error vector u has zero expectation and 
E(uu'-c?« (а non-singular). (1-2) 


o?, a and В are supposed unknown. If the statistician, analysing these data, decides a priori 
that the error covariance matrix is с?ү, he will essentially proceed as follows. Since y is 
known it is possible to find a non-singular N x N matrix Н such that 


HyH’ = I. (1:3) 
Writing Hy = y*, Hx,-xf, Hu=u*, HX = X*, 
the result of applying H to equation (1-1) is the transformed regression equation 
y* = ХВ +и*, (1:4) 
where u* has zero expectation and 
E(u*u'*) = c?HaH'. (1:5) 


The regression equation (1-4) will now be treated by the least-squares procedures.[ Thus 
in our analysis of the effects of ‘guessing’ the error covariance matrix, we may without loss 
of generality proceed on the assumption that the least-squares procedure has been used 
on the model represented by equations (1-1) and (1-2) and merely replace a by HaH' in 
the final results. 

On these assumptions, the estimate of @ used would be 


b = (X'X) X'y. (1:6) 
The covariance matrix of the components of b is actually 
gX'X)3X'aX(X'X)-, (17) 
but it would be taken аз 
| o*(X’X)-1, (1-8) 


where o would be estimated by 


(y -Xb)' (y— Xb) 
йш res کن‎ сс! 9 
8 N-E ч (1:9) 
The true expected value of s? is given by 
Eie?) = X, E((u- XX)? X'u) (а -X(X'X)- X'u)) 
1 1 r r P, 
= yE u—u'X(X'X)-! X'u) 
dm X'aX(X'X)- 1-10) 
= yvuog Uu aX( X) ys ( 


For arbitrary X this reduces to о? only when a = I. 


t By the ‘least-squares procedures’ is meant not only the estimation of the regression coefficients 
by least squares but also the use of the variance estimates and the t- and F-testa which are appropriate 
to the case where the error covariance matrix really is 102. We might have introduced the name 
‘I-procedure’ to cover this. Thus when the statistician decides the error covariance matrix is 7Y 
he will use the ‘y-procedure’. 


С. 8. Watson 329 


2. BOUNDS ON THE BIAS OF THE VARIANCE ESTIMATES 


The bias in the estimate of the covariance matrix of the regression coefficient estimates 
may be found from (1-7), (1-8) and (1-10) and is given by 


tra — tr X'aX(X'X)-! 


oN xy (Ea EE EEN axxx). (21) 


In order to examine the magnitude of the effect on the variance of a single regression 
coefficient we now assume that the regression vectors form an orthonormal set, i.e. 
xX =I. (2-2) 


The condition (2-2) invoked to simplify (2-1) leads to a loss of generality; but, without it, 
the extreme value problem is very difficult. Fortunately we will not be so restricted when 
we come to discuss significance tests. On this assumption the estimate of 2, will be 


b, = ху, (2-8) 


and the bias in the estimate of its variance will be 


tra—trX'aX _, 
‘er sae xax) 
k 
tra— У xjax; 
that is, g? х= Т —х ах, (2-4) 


We now require the extremes of the expression (2-4). These will be seen to follow easily 
from the following algebraic result: the extremes of the quadratic form 2/7, where Z is 
a unit vector in a subspace spanned by a subset of the latent vectors of а, аге the least and 
greatest latent roots of а associated with the subspace. Thus the lower bound of (2-4) is 
obtained by choosing one of the z-vectors, say ху, to be the latent vector of a corresponding 
to its largest latent root and the others to be the latent vectors corresponding to the next 
k— 1 largest roots. The upper bound is found by a similar choice using the smallest roots 
of a. More generally if ( < k) of the regression vectors are latent vectors of а, the remaining 
regression vectors lie in an N —h dimensional subspace. If x; lies in this subspace, the 
above results hold good if the choices are made with respect to the set of the N — h latent 
roots associated with this subspace. If x, is a latent vector, a trivial modification is necessary. 


Thus for X subject only to (2:2) 


(а) Maximum bias = o*((mean of N —k greatest roots of а) — (least root of sir (25) 
(b) Minimum bias = o*((mean of N — k least roots of a) — (greatest root of а)). 


If the multiple g? is omitted the results are in the nature of bounds on the fractional bias. 
It will be noticed that (2:5 (a)) is positive and (2:5 (b)) negative whatever « may be. It is 
commonly believed that presence of serial correlation makes the variance estimates 
deceptively small. The numerical results given in Part II show this to be the stronger 
tendency, but (2-5) shows that this need not always be the case. The deciding mai is aa 
to be the relationship of the regression vectors to the latent vectors of a. Finally, (2:5) 


shows how the bias must tend to zero as @->1. 22-2 


330 Serial correlation in regression analysis. I 


3. LOWER BOUND TO THE EFFICIENCY OF THE ESTIMATES 


We turn now to a consideration of the efficiency of the estimates (1-6) of the regression 
coefficients. It is necessary to determine how this efficiency falls off as « differs more and 
more from I. To consider the efficiency of the estimation procedure as a whole it is con- 
venient to introduce the generalized variance of the estimates. The generalized variance of 
the estimates 0, of (1-6) is defined to be the determinant of their covariance matrix (1-7). 
Aitken (1948) has shown that the linear estimates of 2; with the least generalized variance 
are those given by (X’a-!X)-1X’a-y; they have the covariance matrix c?(X'a-!X)-1, 
The efficiency of the estimates (1-6) is then naturally defined as the ratio of the deter- 
minants of the covariance matrices a?*(X'a-!X)-! and (1:7). Thus 
хха 
It should be noted that Eff. (b) is invariant when Х is transformed to Р by P = ХС. It is 
always possible to find С so that P’P = I. Thus it may be assumed without loss of generality 
that X satisfies (2-2). 

If all the regression vectors are latent vectors} of а, Eff. (b) = 1. This may be verified 
immediately since only the diagonal terms in X'X, X'aX and X’a—X are then non-zero. 
The result finds an application in the work of R. L. & T. W. Anderson (1950) on testing for 
circular serial correlation when a shórt Fourier series has been fffted by least squares. In 


Bf. (b) = (3:1) 


this problem the regression vectors are the latent vectors of the quadratic form occurring | 


in the joint density function of successive observations from a circular autoregressive 
process so that the estimates of the regression coefficients are optimal whether serial corre- 
lation, of the type envisaged, is present or поб. ў. 

The expression (3-1) tends of course to unity as a—>I. The important question here is 
how small Eff. (b) may become, where ais fixed and not equal to I but when X is unrestricted. 
This lower bound to the efficiency may be obtained by generalizing an inequality due§ 
to J. W. S. Cassels. Cassels’s inequality states that for fixed 4,7 0, 6;>0 and all wj >0 


Ü PEIN) s : 
ФЕ _ (è 72 
БА 


where r = met a,/b;, R = maxa,/b;. If the numbering of the a; and b; is so arranged that 
j 


«1, (3:2) 


r = a4[b, and R = ay/by, then the lower bound is attained for w, = 1/a,b, and wy = 1/ayby, 
w; = 0 (j+0,N). The lower bound in (3:2) is the square of the ratio of the geometric to the 
arithmetic mean of r and R. 


To obtain the minimum of Eff.(b) by varying X subject to the restriction that the 
regression vectors are orthogonal, we first eliminate from the problem any regression 


A In contrast to this, the bias matrix (2-1) is not zero when the regression vectors are latent vectors 
of a. 

i This result is, however, little more than a mathematical curiosity. For apart from the fact that 
the Andersons' test has power against other alternatives, such estimates while optimal cannot be 
validly tested in the usual way. 

$ A proof of this inequality is given in the Appendix. The inequality and the proof were com- 
municated privately by Dr Cassels to the author, who wishes to record his gratitude. 


| 
l 
| 
| 
| 


4 


С. S. WATSON 331 


vectors which are latent vectors of a. Suppose that x, 4 .,, ..., X, are latent vectors of а 
associated with the latent roots a ,,,,...,zy. If the remaining x, have co-ordinates 


Zu (t = 1, ..., N) when referred to the latent vectors of a as axes 
k-h/N-h үз 
n (x л) 
Ef. (b) = il 1-1 Ў (3:3) 


1 
PLANA 


The matrices X'aX and X'a^!X of (3-1), and therefore the matrices in the denomirtator of 
(3:3), are positive-definite by their form. By a theorem of Hadamard (see Hardy, Littlewood 
& Polya, 1934, p. 34) if c; are the elements of positive-definite matrix 


[Cy | S162 --- буу. (3:4) 
N-h 2 
Thus k-h ( У а) 
Eff. (b)> П — 


ШЕ ТЕР 


Without loss of generality we may assume that a, < 4, € ... < xy . We now choose the Z 
vectors to minimize the right-hand side of (3-4) applying Cassels's inequality successively. 
When this is done, the matrix X takes the form 


= y © 
[^ +ау_а,+ау_һ-1 9, 4 RN e 


P бн B 
Using (3-2) with a, = Ja, a,b, = 1 and w, = 23, we have the result} 


dayn Alaani Sen N Rl ср, (b) <1. (3:5) 
(e + аул)? (q+ ON na)? (A sea)" 

Thus in spite of the use of Hadamard’s inequality, we have arrived at а lower bound for 
Eff. (b) which is attainable. This is so because we were led to a choice of X for which X’X, 
X'aX and X’a-1X are diagonal. The general rule for the formation of the lower bound to 
the efficiency is evident. From the N — roots of a not associated with any regression vectors, 
we choose successively the (k—h) most extreme pairs. The lower bound to the efficiency 
thus depends on the ratios of the extreme roots of a. When the roots of a are all approximately 
equal (approximately equal to unity if the error process is stationary) the efficiency must 
be high. It will be shown in Part II that high serial correlation of the autoregressive and 
moving average types leads to a having roots of very different magnitudes and therefore 
to the possibility of low efficiencies. Finally, it will be noted that the maximum and 
minimum efficiencies are taken when the arbitrary regression vectors аге respectively 
latent vectors of а and sums of certain pairs of latent vectors of a; using the invariance pro- 
perties of (3-1) these statements serve to define the maximizing and minimizing regression 
spaces, 


4. BOUNDS ON THE SIGNIFICANCE POINTS OF THE t- AND F-TESTS 


The significance tests usually made in regression analysis relate to linear hypotheses de 
the regression coefficients. Thus in the model ( 1-1), these hypotheses will be special cases o: 
a null hypothesis specifying the value of one or more linear functions of By, «++» By» When the 


+ This form of the lower bound was conjectured by J. Durbin. 


« 


332 Serial correlation in regression analysis. I 


errors are independently distributed with the same normal distribution, it is well known 
that the optimal test statistics for such hypotheses have the t- or F-distributions. In this 
section bounds will be obtained for the distributions of these statistics when the covariance 
matrix of the errors is oa; these bounding distributions converge to the tabulated dis- 
tributions as a— I. We begin by showing that there is no loss of generality in considering 
only hypotheses specifying the value of several regression coefficients and in assuming that 
the regression vectors form an orthonormal set. The bounding distributions for this 
reduced problem will then be derived. The section concludes with a short discussion of the 
numerical determination of the bounding significance points. 

We consider again the model (1-1), у = ХВ + u, where the errors u have a multivariate- 
normal distribution with zero mean vector and covariance matrix ou and where the 
regression vectors are linearly independent. We suppose that the null hypothesis to be 
wd 8E 6-1. «E, (41) 
where f,, ¢, (i = 1,...,h) are given and f, (i = 1, ..., 2) are linearly independent. The optimal 
test of the hypothesis (4:1) when а = І is obtained by building up a quadratic form in the 
variables f; b — ¢, (b is the least-squares estimate of B defined in (1:6)) which is distributed 
as a multiple of x? with № degrees of freedom. The form must therefore be based on the 
inverse of the covariance matrix of the variables f;b —4,;. Because g? is unknown it is 
necessary to divide this form by the estimate s? of о? given in (1-9) in order to obtain a test 
statistic. The ratio so obtained has the F-distribution with h and N — k degrees of freedom 
provided the null hypothesis is true. This is the statistio we must examine when a is nof 
necessarily equal to I. 

If X is transformed to Z by the transformation P 


Z-XP, (42) 
1$ 18 always possible to find a non-singular P so that ZZ = I. For P has only to satisfy 
PXP =I. (£3) 
Defining $ = P-!8, where P satisfies (4-3), (1-1) becomes 
y = Z6+u. (44) 
The least-squares estimates of апа 8, b and d say, are given by b = (XX) X'y, а = Z/y. 
It is easily seen that 
d = P-!b. (£5) 


The denominator of the test statistic is s?, and it has been seen in the derivation of (1:10) 
that s? is proportional to u'u-wX(X'X)-1u, (4-6) 
which becomes, under the transformation (4:2) and (4:3), 

u'u — u'Z(Z/Z)? Z'u = u'u —u'ZZ'u. (47) 


Thus (4-6) is invariant under transformations (4-2) and (4-7) is invariant under trans- 
formations (4:2) with PP’ = І. On the null hypothesis, the variables in the numerator 


are given b 
етт fib-j,-f(XX)X'wu (i =1,...,h). (4:8) 


С. 8. Watson 333 


Thus for i = 1,...,В,Ъ®— ¢, are distributed as f;b when В = 0 or as f; Pd when $ = 0. We 
may express the numerator quadratic form in terms of d without first stating itin terms of b. 
Writing f; P = g, it must on the null hypothesis have the form 


&$ e Sig)“ feid 
[@id,....8hd]] + ЕЙ (4:9) 
gig sit led 


because this is seen to be correct when а = І. (4-9) may be written as 
iĝ с a |" а. 
gig - Bêl 18 


The matrix of this quadratic form in @ is symmetric and has h unit roots and k—h zero roots. 
Thus an orthogonal transformation H may be found so that if d = He, (4-10) reduces to 


d'[$,, ... £il | (410) 


, 
А 


с'Јс, where гы L 0, 
0, 0.1” 

a kx k matrix with units in the first places on the leading diagonal and zeros elsewhere. 
Since H is orthogonal, c= H-1d = H'Z'u. (411) 
If finally we put W — ZH, (4:12) 
then pw 
and the test statistic is proportional to 

u'WJW'u (u w,)3 + ... + (U Wa)® (4-13) 


па ауа uu (uw; -...—(uw,)*' 
where Wy, ..., wj, form ап orthonormal set of vectors. This is the desired canonical form for 
the test statistic because it is the form which arises in testing the hypothesis 
=¥ (ф=1,...,В) 

in the model у = Wyr+u, 
where W'W = Land ф = (V, ..., VJ. € defined above is the least-squares estimate of qp. 

Since the proof above for the general linear hypothesis presents some difficulty, it seems 
worth while to establish the result for a simple particular case. Suppose that the null 
hypothesis is B = e. (4:14) 
to (1-8) which defines the covariance 


T i f the test statistic, we refer i 
Macc 8?. Thus the statistic is proportional 


matrix of bi G, and (1-9) which gives an expression for 
b (b—Bo)’ (ХХ) (b=Bo) (4-15) 
(y-Xb) (y— Xb) 

Since b = (Х'Х) 1 X'y, the statistic (4-15) on the null hypothesis 
WXX 
wu-wX(XX)'Xu 

If now X is transformed to Z, eo thas 22 e b the discussion of (4-2)... (47) shows thee 

(4:16) is reduced to | wzZu (417) 

пи-и'77'и' 


may be put in the form 


This is the required canonical form for the hypothesis (4-14). 


334 Serial correlation in regression analysis. I 


We consider the model of (1-1) and (1-2) under the assumptions of orthonormality (2-2). 
The errors u will now be assumed to have a multivariate normal distribution with zero mean 
vector and covariance matrix oa. We suppose initially that the first Ё — 1 of the regression 
vectors x are latent vectors of a, the remaining vector being arbitrary. Denote x; = a; 
where aa; = 4,a; (i = 1,...,4—1). To test the hypothesis 


By = Br (418) 


f= x шыш 0,— FY + (4-19) 
x (u- x bau by) 
t=1 1 


On the null hypothesis t?/(N — k) has the matrix form 


the appropriate statistic is 


ar «3 u’X,X;,U = 
N-k u'(I—-a,aj-...—23, ар Хх) U 


T, say. (4:20) 


Bounds on the significance points of T will now be found which enclose the true significance 
point of Т, no matter what the direction of the vector ху. 
By a device, often used in the treatment of ratios, we have 


P(T <7) = P(u'(x,x; —7(1—2,2j —...—2, а; 4 —X,X;)) U < 0). (4:21) ` 


N 

But the quadratic form on the right-hand side of (4-21) is distributed like У v;vj, where 
i=1 

v; are the roots of the determinantal equation : 


| xxx, —7(1—2,8; — ...—824 4,21 4— X, Xj) va! | = 0, (422) 


and the v; are N.1.D. (0,1). For fixed 7, let x, vary. Then the v; and P(Zv;v? < 0) vary. We 
will show that there exists a direction of x, such that all the v; take maximum values. For 
this ху, P(v;v? < 0) is clearly a minimum. Since, for fixed x,y, P(T < r) is a monotonically 
increasing function of т, it follows that the significance point of Т, Ty say, for this special 
maximizing direction is greater than the significance point at the same level for any other 
vector ху. In the same way a lower bound ту, will be established. 

For this purpose apply to (4-22) the orthogonal transformation К which diagonalizes а. 


4:23 
K’oK = les А j e E 
Ш 
then the first Ё— 1 co-ordinates of m are zero because x, is required to be orthogonal to the 
k—1 latent vectors corresponding to о, ..., a, and 


0 


A E ...—e.. ni jK = б і (424) 


С. S. Watson 335 


where the matrix on the right-hand side of (4-24) is an N x N identity matrix with the first 
k—1 diagonal elements replaced by zeros. Thus equation (4-22) has k—1 zero roots and 
N—k+1 which satisfy the reduced determinantal equation 


v 
| +) пае (L+7) mg mys, es (L+7) тту 
E: O = 
| (1 + T) m, үт, nidis T ы ‚ (1+т)тту [29 (4-25) 
(1-- T) my my, Е o s Ud A (L7) тат | 
N 


Subtracting 7; „1/7, times the first from the second row, etc., and expanding the result as 
a bordered determinant we find the equation in >, 


N y N x 
П (+Z)-a+n Ут II (r+) = 0, (4-26) 
p=k\ Qp p=k +p) © 
N т? 1 
1.е. =e ier (4:27) 
Г кыым 


& 


We see that the effect of the regression variables chosen as latent vectors has been the same 
as on previous occasions, i.e. the corresponding latent roots are eliminated from the pro- 
blem. We may therefore without loss of generality вирове that 


Ap < Aky € SAN: © (4-28) 
The location of the roots v; is easily seen graphically from (4-27). We may, however, proceed 
as follows. Define the continuous function of v, 


N y X ү v 
fing nz) т ы" (+3) 


J. N a. 
Then since та) = - (1+7) тту П [ 9) 
ji 


is seen to have the opposite sign to f( — тау), the equation (4:26) must have at least one root 


in each of the intervals ( — T4444, — T9). 


But, m U- equation (4:27), we see that 


LL RE E OW. Hm 
By EN Ir k r42 
a; i 


by (4-28) and the fact that Ут = 1. Hence there is at least one root of (426) and (4:27) 
Е 

in the interval (од, &y)- 4 

Combining the results, we have proved that there is one root v; in each of the N—k+1 

intervals (= Tart —T04) @ = k, N= 1), (Op, ак). (4:29) 


336 Serial correlation in regression analysis. I 


Finally, if x, is chosen to be the latent vector associated with the root ол, і.е. ay, (4°26) 
given Us thames —Tüw, ..., —TX,,; and a, (4-30) 
which are seen from (4-29) to be the least possible set of roots. If X, is chosen to be the 
latent vector associated with ay, i.e. ay, (4-26) gives the greatest possible set of roots 


—T&w.p +, Tp and ay. (4:31) 
We have therefore proved 


THEOREM І. Ifrissuch that P(T «7) = C (0€ C « 1), T being defined by (4-20), then there 
exist numbers Tz, Ty such that 
TL ST STY (4-32) 
for all vectors X,. 7; and Ty are determined by the equations 
Oe Ne 
P( d <т ) =0, 
Ce Mert FANN © 
г ау an 
Op Met se + Oey a 


where 7, ...,9y_1 are N.I.D. (0, 1). 


We now extend this theorem, with the same assumptions about the model, to a joint 
- test of the null hypothesis 


(4-33) 
« re) = б, 


Ai=Pi s. f= (h<k-1), В, = 39. к 
The statistic T' for this hypothesis is on the null hypothesis 


_ Wa,aju+...+u’a,a, U --u'x,x,u 


= Я 4:35 
u'(I—-2,2; —...—2, а; 1 —x,x,)uU ш 

Proceeding as before the determinantal equation is found to be 
| 32; +... +а„а,+х„х—т(1—а,а—...— 3,48,,—X,Xj)—va-1| = 0. (436) 


After applying the orthogonal transformation K of (4:23) the roots v; are seen to be 

Xis Xas ..., An k—h— 1 zeros, 
and the N—k+1 roots of (4-25). This leads immediately to the following extension of 
Theorem 1. i 


THEOREM 2. Tfr is such that P(T < т) = C, (0 < C < 1) T being defined by (4:35), then there 


exist numbers 7, and ту; such that 
TIST €T 


for all vectors ту. 7, and Ty are determined by the equations 
Hy NE +... + ж + о 
p(atit FO ERR 0, 
ара +... аут ` © 


04 7/4 +... Баһ + ay Nr | 
у) рее Ө) RINT ANTIN ب‎ 
( +... + Oy Ma Te 


(4:37) 


where 7, ..., Nhs Ns 97у ате N.LD. (0, 1). 


С. S. WATSON 337 
To complete this case, we examine the test of the null hypothesis 
By = BY ..., Ва = BR (Һ<Ё—1). (4-38) 
The T statistic is here defined on the null hypothesis by 


T u'a,aju +... + ua, aj u 


=T- EET EET TAT we 
The associated determinantal equation is 
| аа;+...+а,а,—т(1—а,ау—... —84.,85.1 — Xy X4) - va^! | = 0. (4-40) 


The roots of (4-40) are ол, ...,%, Ё— В zeros and the roots of the determinantal equation 


у 
тт —T— = ттд» „h TM My 
к 
v 
TI, Mg Tha 77-4 ed eray E (4-41) 
* v 
тту? тту?» es TM —T— 
N 
Expanding as before, we find the equation in v 
N v N А 
T-—|-7 X т П (r+) = 0 (4:42) 
if 4 i=k A g ; 
N 2 
ie. = - -- (4-43) 
8 +— 
2 


As with equation (4-26), it can be shown that equation (4-42) has at least one root in each 


of the intervals (Tq — T9) 
441 ue 


By inspection we see that it always has a zero root so that there is just one in each of these 
intervals. If x, is chosen to be the latent vector of @ associated with the root 24, і.е. ак, 
the least set of roots for v, is obtained, while if x, is the latent vector of а associated with the 


root ay, i.e. ay, the greatest set is found. We may thus state 
THEOREM 3. Ifrissuch that P(T <7) = С(0< 0 <1), T being defined by (4-39), then there 
exist numbers ту, and тр such that —Ó 


for all vectors ду. ту, and Ty are determined by the equations 
p| AR a <71) a) 
ga Megat + HANIN 


2 
pl 2b toh <) co. 
а +... + ®м-\їЇм-ї 


(4-44) 


where у, ... ns к... Ny ате NLD. (0, 1). x р 
We on ме үн a сазе where there is more than one arbitrary regression vector. 


Suppose that all & regression vectors are arbitrary except for restriction (2-2) and that the 


hypothesis to be tested is &-R 6- 1,...,. (4:45) 


338 Serial correlation in regression analysis. I 
On the null hypothesis the usual F-ratio is distributed like a multiple of 


uXXu 1 
T^wg-Xxx)ju ed) 
T  wXX'u 
therefore R= T+T = an (4 47) 


Since T' is a monotonic function of R (т = ix) ‚ the extremes of P(R <r) will yield ex- 


tremes of P(T « 7). We therefore proceed to find distributions which bound that of R above 
and below. If H is the linear transformation such that 


HaH' — I, (4-48) 


then the variables у = Hu are х.т.р. (0, 1)—for 0? may be set equal to unity without loss 
of generality. Then 


R- v'H'WXX'H3v 
— v(HH')dQy ` 
From (4-48), Oy 0 
(НН). = s j 
0 ау 
so that we may write 
R= M AS Rog (4:49) 
> evi 


Ris a ratio of non-negative definite quadratic forms in х.т.р. (0, 1) variables. For variation 
of X, the upper and lower bounding random variables for R are obtained when the numerator 
takes its least and greatest forms. Now the numerator of R is distributed like 


k 
à ViN, (4-50) 


where 7, ...,), are N.1.D. (0, 1) and »,, ..., v, are the Ё non-zero roots, written in ascending 
order of magnitude, of | l'3XX'H-21— yI | =0, (4:51) 
i.e. the non-zero roots of 

| H2H'3XX' — vI | =0; 


since the roots of a product are indifferent to the order of multiplication. But by (4-48) 
Нен’ = (H'H)J = a, 
во that v, <... < v, are the k non-zero roots of 
|axX’-vI| =0. (4-52) 


Suppose that z,, ..., Zy-, are vectors which together with x,, ..., X, form a complete ortho- 
normal set in N-space. If Z = [ 21, ..., Zy..,], then, 


[X, Z][X, 2] = [X, Z' [X, 2] 
z I. 


С. 8. Watson 339 
But [X, ZI [X, Z] = XX' -ZZ', 
во that XX’ = 1-ZZ 

= (1-22)... (= Zy 4 ZN-4) 

= M,...My_,, вау. 
where M,, ..., My_, are idempotent matrices of rank N — 1. 

With this characterization of XX', inequalities on the roots of (4:52) may be obtained 
immediately from the investigation of Durbin & Watson (1950). They show, with a proof 
similar to that of Theorem 1, that the roots of aM, lie between the roots of a. Thus the non- 
zero roots of «MM, lie between the non-zero roots of aM. Combining the inequalities 


given by successive applications of this result, it follows that the roots »,, ..., v, of (4-52) 
must satisfy the inequalities 


Oy SPI Sy pup Фа SVaS ON pam c Ue < MySay. (4-53) 


The least set of roots is obtained when the & regression vectors are chosen to be the latent 
vectors 4,, ..., 4, (or linear combinations of these vectors). The greatest set of roots is 
obtained when the regression vectors are similarly related to the latent vectors ay 4.1 ++ 
ay. These choices lead to the extreme forms of R required. We may thus state 


THEOREM 4. If7 is such that P(T «7) = C (0«C « 1), T being defined by (4-46), then there 


exist numbers ту, and Ty such that 
TL ST STU 


for all X. ту, and Ty are defined by the equations 


2 
(necem 2 «n) SoA 
Oa Mesa t + FANN (4-54) 


2 
(жн UN Rea T ss Ba YIN < то) ZU 
Oy Tp T -- + ®х—кЇХ-® 


where 5, ...,?]y are N.LD. (0, 1). eno 
The rules of formation of the random variables which lead to the bounding significance 


points are now evident from Theorems 1, 2, 3, 4 so that there is no need to give the details 
for every type of test which may arise. In all cases it is seen that me distributions E the 
bounding variables converge to the appropriate F-distribution as the ов approach equality. 

The calculation of the bounding significance points discussed above presents some 


difficulty. We are faced with the determination of the significance points of ratio of two 
f normal deviates. Since the weights are all positive, 


independent weighted sums of squares o 
j fferent. 

j i bins (1949) may be used when the a's are not too di 

the method of Pitman & Robbins (1949) may will be given in Part II. The amount of 


Some numerical results obtained by this method С : 
arith meti is always E electronic computer of the Cambridge i т. 
Laboratory was used in these caleulations—so it is worth while examining approximate 
methods. | 

To illustrate the approximations u 
which generates the lower bound їп Theorem 4, namely, 


ал+... OT as 
кына to FANN 


sed in Part II, let us consider the random variable 


. 


340 Serial correlation in regression analysis. I 
The simplest approximation to this ratio is 


ЕУ uwume Do Xl ар E OR 


Е X +... +N 
= a, n 1N 
k+l kl 


where №, y. , has the F-distribution with k and N — k degrees of freedom. If in a practi 
analysis we had some knowledge of the correlogram of the errors, this approach sugg 
that multiples of the usual significance points might be used to make a reasonable r 
A better approximation may be obtained by replacing the numerator and denominat: 
multiples of g? with the correct variances as well as the correct means. The significar 
points of this approximation must be obtained by interpolation and do not have the simy 
interpretation of the first approximation. 


The author has pleasure in recording his indebtedness to Mr J. Durbin for many help 
diseussions. А 


APPENDIX 


A proof of the inequality (3-2) 
THEOREM. Let a, > 0, bj > 0, ш, >0 (j = 1,...,n). Then 
(Zajw;) (Узи) (a,b; +a;b;)? 
Ise a стах 4. 3 4^. 
(Хаш)? ы 4а;а;0,Ы; 
The extreme values are attained where at most two of Wy, ...,w, are non-zero. 
This theorem and its proof below are due to J. W. S. Cassels. 
Proof. As a straightforward extremal problem it may be shown that 
1 „ (1+ kw) (1 +k Ww) (1+0) (1+ k1) 
(1+)? = 4 
Thus the theorem holds for x = 2—for put 


if k>0,w>0. 


NP 
abw, aab, 


Furthermore, the extremes of the theorem are attained when at most two of Wy.. 
are non-zero. It may be assumed without loss of generality that 


Gd as 
b b b, IFO. 
аһ ajb; аф, 
а а; 
b; b, 
a; = Ла b;— Ab; where A»0. 
But if this is so the problem is reduced to one in n — 1 variables. 


For this determinant is given by П 


RA , 80 that if it vanishes we have say 


С. 8. Warsox 341 


To prove that the extrema of (Хас) (Z4}w,)/(Za,b,w,)* are attained when at most two 
of v, ... wy are non-zero, let M bean extreme which is attained when say w, + 0, w, + 0, 1, + 0. 


Then we have d 
du, (Cee) Cif) > Mn lm =O (= 1,3,3), 

and so aX +b} Y-2Ma,b,Z =0 (k= 1,2,3), 
where X = Xaw, Y = Хш, Z = Уаубушу. But the determinant of these equations does 
not vanish. Hence X-Y-MZ-0, 
which is impossible. 

This establishes the theorem. To obtain the inequality (3-2), it is only necessary to notice 
that the upper bound in theorem may be put in the form 


pass | s 44 t 
I] 4 y, n 

where r; = a,/b,. Since the function w+ 1/w takes its maximum at the end of the range of 

variation of w, (3-2) follows. 


REFERENCES 
ArrkEN, А. С. (1934-5). On least squares and linear combinations of observations. Proc. Roy. Soc. 
Edinb. 55, 42-8. * 
Arrxen, A. C. (1948). On the estimation of many statistical parameters, Proc. Roy. Soc. Edinb. 
62, 369. 


COCHRANE, D. & Овсотт, G. Н. (1949). Application of least squares regression to relationships con- 
taining autocorrelated error terms. J: Amer. Statist. Ass. 44, 32-61. “Ж 
Durs, J. & WarsoN, б. S. (1950). Testing for serial correlation in least squares regression. 1. 
Biometrika, 37, 409-28. d T. à 
Duress, J. & Watson, б. 8. (1951). Testing for serial correlation in least squares regression. п. 
Biometrika, 38, 159—18. 
192 i ion. Biometrika, 42, 133. 
Hannay, E. J. (1955). Exact tests for serial correlation. Biometrika, ‚133. 4 
Harpy, G. H., LITTLEWOOD, J. E. & POLYA, G. (1934). Inequalities. Cambridge ER 
Рітмах, E. J. G. & ROBBINS, Н. i tmi E of the method of mixtures to quadratic forms 
normal variates. Ann. Math. Statist. 20, 552-60. ч ^ 
Worp, H. (1949). On least squares regression with autocorrelated variables and residuals. Trans. 


Int. Inst. Statist. Reprint, рр. 1-13. 


[ 342 ] 


SOME THEOREMS AND SUFFICIENCY CONDITIONS FOR THE 
MAXIMUM-LIKELIHOOD ESTIMATOR OF AN UNKNOWN 
PARAMETER IN A SIMPLE MARKOV CHAIN 


By J. GANI 
Australian National University, Canberra, A.C.T. 


I. SUMMARY 


The paper begins with proofs of the usual theorems for the optimum properties of the 
maximum-likelihood estimator of an unknown parameter 0 which defines the transition 
probabilities p,,(0) of a simple ergodie Markov chain. By an ergodic chain is meant one for 
which, not only is the final chain stationary, but also all possible initial states remain per- 
manently available; these conditions are sufficient to prove that the maximum-likelihood 
estimator is consistent, and asymptotically normally distributed. 

The paper proceeds to establish the form of the transition probabilities p,,(0) which admit 
a sufficient estimator of 0. To do this, the form of the likelihood function admitting a suffi- 
cient estimator when the parent distribution is discrete is first derived: this is used to obtain 
the form of the probabilities p,(0) for a multinomial distribution admitting a sufficient 
estimator of 0, and the result is finally generalized for the transition probabilities p;j(0) 
of the simple ergodic Markov chain. 

The paper closes with an examination of possible forms for the matrix p of transition 
probabilities ;;(0), and these are illustrated with simple examples for Markov chains with 
two and three states. 


lI. THE LIKELIHOOD EQUATION FOR A SIMPLE MARKOV CHAIN 


Consider the simple Markov chain with s possible states Æ}, ..., E, and the matrix of trans- 
sition probabilities p,,(0) = Pr (E; | E;) (i,j = 1,..., з), which are all functions of an unknown 
parameter 0. The matrix is given by 


Pu(8) .-. P40) 
p(0) N gaa ans | Я 
Pa(0) ... PaslO) 


where the transition probabilities are subject to the condition that 
324571, for allt = 1,2,...,5. (1) 


If a realization of the chain results in an observed sequence § of n + 1 states in the following 
order REB. By Be 
then it is possible to write the probabilities of this sequence as 

608) = 40) рб)... рыб), o 


where a;(0) (i = 1,...,8), the initial probability distribution of the states By, +++) Ep is 
assumed known. 


J. Самі 343 
The likelihood function will be given by 
L = 1. (8) = Ina; +Inp,+...+ In py. 


Grouping together the transitions from states E, to states E;, (i,j = 1, ...,8), and denoting 
the frequency of these transitions by n;;, we write this likelihood function as 


в s 
L= па; LEGE Ny N Pijs 
or for simplicity 
L= Ina, + туару. (3) 


It follows that, providing a,(), p;/(8) are differentiable with respect to 6, the derivative of 
the likelihood is 
dL _ 1%: gu Pu, (4) 
dO a, dO іру 40 
As n increases, the second parts of (3) and (4) become dominant, and the likelihood function 
and its derivative will be 


Г~ Утар, (5) 
ii 
dL тар 
‹ Amex dc. 6 
and dà E» 18 (6) 
The maximum-likelihood estimator 7 of @ will in this case be given by 
dL ( =) z1 (7) 
(9) т Jp; dô r 


It is important to note that the л, are not linearly independent; the number of transi- 
tions from the state Æ; to the states E, ..., Es» will, except for a possible end-effect, be equal 
to the number of transitions from the states E, ..., Es, to the state E;, so that 

8 8 
Уту = У п (8) 
j=1 r=1 


where the sign = indicates equality or a possible difference of 1 between the sums. As 
n increases, we may therefore accept the equations j 


8 E ў 
X n = Me =m (= ee) (9) 
j=1 r=1 
8 8 8 
where SER = У У =". 
i=1 i=1j=1 


III. OPTIMUM PROPERTIES OF THE MAXIMUM-LIKELIHOOD ESTIMATOR 


Following closely the proofs given by Cramér (1946) and Rao (1952) for continuous dis- 


tributions, we shall deduce that for an ergodic chain, the maximum-likelihood estimator 


T obtained as a solution of (7) has the following optimum >ч 
(а) The estimator 7 obtained as a solution of (7) is consistent. —— 
(b) The consistent solution of (7) is asymptotically normally distributed about the true 


value 0): it is fully efficient. S 
A forth optimum property due to Huzurbazar (1948) can also be ier Y Вены 
(с) Any consistent solution of (7) is such that for n>% the probability that the 


tend to a maximum converges to 1; the consistent solution is therefore unique. 


23 T 


Biom. 42 


344 Maximum-likelihood estimator of an unknown parameter in a Markov chain 


The proof of this property for the case of the chain considered is, however, identical in 
its essentials with that given by Huzurbazar for continuous distributions; it will therefore 
be omitted. In order to establish these properties, we require the assumptions that 


dpi; dpi d?pj 


40° dF? dF 
exist and are continuous for every 6 in a range R including the true value 4. Before pro- 
ceeding to prove these theorems, it may be useful to mention some properties of ergodic 
chains which we shall refer to. 

For an ergodic chain, all possible initial states remain permanently available, and the 
final chain is stationary and independent of initial conditions. Although it is possible for 
some transition probabilities p;; of an ergodic chain to be zero, the stationary probabilities 
Р, = Pr (E,) (i = 1,...,8), which are given by the matrix equation 


p'P- P, 
together with the equation X, P; = 1, where P is the column vector of elements P,, will all 
i 


be non-zero; it is clear that the Р, will also be functions of the parameter @. In equation (2) 
no assumption was made about the chain being initially stationary, so that the a; are not 
necessarily equal to the Р,; however, if the process is initially stationary, we can write 
a, = P, in this and the subsequent equations (3) and (4). In all that follows, we shall for 
simplicity consider an initially stationary chain; the results obtained will also hold asymp- 
_totically for a chain which is not initially stationary. 

The following set of results for an ergodic chain is connected with the matrix R, with a 


transpose of form 
Due" рузе ... раё" 
о ЕСА а о СЯ К (10) 


which was introduced by Montroll (1947) and Bartlett (1951), and will later be used to obtain 
the moment generating function of the transition frequencies n;;, previously mentioned 
in equation (5). It is easy to see that the latent roots x(t) of this matrix, where t is the 
matrix of elements ¢,; (i,j = 1, ...,8)'are given by the equation 

and are continuous in t. For t = 0, this determinantal equation becomes 


|p—1| = 0, (12) 


with roots j(0), ..., (0), not necessarily all distinct; we may, without loss of generality, 
assume that these are arranged so that their moduli are in descending order of magnitude. 
For a stationary chain, it is known that 

Е 


(0) = 1, |4,(0)|<1 for the remaining = 2,3,...,8; 


it follows from the continuity of the roots /,(t),...,2,(t), that for t in the neighbourhood 
of t = 0, we have 


[00| <|,(t)| for allr = 2,3,...,8. 


ee leer و ن و و ن‎ RN 


J. GANI 345 


We prove also that for an ergodic chain, for some t in the neighbourhood of t = 0, (t) 
is not identically equal to 1. For, suppose 4,(t)=1, then for t such that t, +0, and ty = 0 
for all other values of i, j, the equation (11) would give 


Рпён-1 Py Pu 
Pa Pa 1 ... P» = 0; (13) 
Da owe vs Parl 


On expansion, this could be written 
(py #ъ— 1) C  piaCr +... + Piss = 0, 
where the C}; are cofactors of the elements in the first row and jth column. For t, small, 


this is 
i Zulu Cu + (511 - 1) Cu + Praia +... + pj, Oy, = 0. 
Now if 4, — 0 also, so that t — 0, equation (13) would give 


(2 = 1) Ca t P130 * Pus = 0; 
we see, therefore, that ,(t)=1 only if ру, бү, = 0, во that руу = 0 or C, = 0, or both are 
zero. Оу cannot be zero, for since //(0) = 1 is a simple root, then 


pleat 

rz — 1 3-0, 

ДИРИ, 
or, on expansion, 

— {C1 + C + C.) F0, | 
where C,, are cofactors of the elements in the leading diagonal of (13) when t; = 0. At least 
one of these Cj; is non-zero, and we may without loss of generality assume that C}, is such 
a non-zero cofactor. If, in addition, p is non-zero, /;(t) + 1 for at least the case when tı, +0 
and ¢;; = 0 for all other i, j. { ‘ 
It is possible, however, that руу be zero; if this is so, then for an ergodic chain, at least two 
of the probabilities p49, ..., 21 Will be non-zero.* From equation (12) we see that in this case 
— Oa + P1202 + +С = 0, 
во that it is clear that at least one of the cofactors Cj, (r = 2,...,8) which multiplies a р 
zero probability pı, is non-zero. Let t now be such that „+ 0, and the remaining tj; i : 
zero; then in exactly the same way as before, for a small value of tı», it can be cda tj s 
(0) =1 only if p,, C4, = 0, a condition which is absurd. So that for pj = 0, there " 
exists a t for which j,(t)+1. It follows therefore that in general for an ergodic chain, for 
some t in the neighbourhood of t = 0, x(t) is not identically equal to 1. 
(1) Consistency of the maximum-likelihood estimator 
If the estimator 7’ obtained as a solution of (7) is to be consistent, it is necessary that if 

6, is the real value of 0, then as 2-00, 

Pr(|7 —6,| «9) 
where ĝis any small positive number. 


* By an ergodic chain is usually understood one for Num CRINES i el iiie H, pé Ms 
It is possible, however, that a particular р; = d ots? ja "spo uie all he а l, ..., 8. Our 
tracted, and we may therefore accept that for an ergodic chain, 0 < pi , 
statement follows immediately. 


(14) 


23-2 


346 Maximum-likelihood estimator of an unknown parameter in а Markov chain 


Consider the expansion 
dL (dL d*L 2191 
25 = (o), 0—05 (а) 0-09 las, (ш) 
He D. LAM 4 [її (лд (dL 
where 0, lies in (0,0). For simplicity, we write 5 5 (=), l, as nB,, nB,, nB, 


respectively, where these are obtained from (6) as 


Nn; dp, 
nBy = маи) 
: (Bn 40 % 


1 dp, 1 (dp? 
nB = | Ta Up -x ag) || А 16 
= (lp, a - 5 do] Js, (16) 
1 dp; 3 арр, $ 2 Sa) 


nB,- ih, - fis 
- (z ül p; d ру 40 40 р„\а@ 

It is simple to see that as > оо, the expectations of the n;;, irrespective of whether the 
chain is initially or only finally stationary, will be given for alli,j = 1,2, ...,5, by 


E (ny) = nP;pi. 


% 


Further, since on differentiation with respect to 0, equation (1) gives 


= бр; dpi; 2, Bp, 
E r 0, d$) ш 0, ل‎ = 0, 
ر‎ 0 0 аё zT 
we obtain for the expectations of nBy, nB,, nB,, the results 
d, 
n& (By) = "zh za 21; | 
P в, 


né(B,) = -afg Py 8], = PS, (17) 


j Pij 


2 (dpa)? _ 3 dry Pn, 
“в = кв] (Gi) =p a а), = no 

where i(0) is obviously finite and positive. We now prove that the variates »;;n- converge 
in probability to their expectations; from this it will follow that although the %,; n~ are nob 
independent, By, B,, B,, which are linear functions of these variates, also jointly converge 
in probability to their expectations. - 

In order to prove the convergence in probability of the variates n,;n- to their expecta- 
tions P;p,;, we shall evaluate their variances using a method described in Fréchet (1952, 
р. 73). A transition frequency т, is treated as the sum of т variates ХХ), so that 


ny = X XY, 
r=1 
where X1) takes the values 1 or 0 at the rth transition, depending on whether this transition 
is or is not from the state E; to the state E;. It is clear that 
&(X9) = P,P, 
Y (XQ) = EE-E 
=Рру—РЇр (r-12,...n), 
and G(X XG) = 600) XY) - (X1) 6 (XG) 
= Pp" PPFP, (nt2123,..,t»r), 


- е ИЕНЕН ЕЛ 


J. Gant 347 
where p7” 7™ is the probability that a transition from the state E, to the state E, occur in 
t-r- l steps, and is the (j,i)th element of the matrix p-1, the (t—r— 1)th power of the 
matrix p. 

The variance of, of n,, is therefore given by 


о = {5 
-¥(3, x1) 
= n(Fipy- Pipi) + Pipi; X (pf; *-? — P), 


where there are 4n(n—1) terms of the form (pff-"-" — Pj). Now, since the chain is ergodic, 
and therefore stationary, it is known that for аі r— 1 > ny, where n, is some finite positive 
integer, the terms у 

g | pz" »-p| 


converge to zero at least as fast as the terms of a convergent geometric progression with 
a ratio g where 0 < q.« 1 depends on nọ. If, therefore, we write the variance сїў; as 


n-2 
ci; = n(P;p, Pipi) + 2i pi; У (n -k-1) (0f — P). 
we see that o3, nF" will clearly converge to 
lim o}n = Pp; — Pipij* 2Р,р}у5у = А, 


n> 
о 
where A is some value independent of n, and 8j; = A R-P). 


We now prove that for ergodic chains, the limit A must be non-zero. To do this, we use 
a theorem given in Fréchet (1952, рр. 86-8) which applies to the frequencies n; of (9), giving 
the number of times in a realization that the system is in state Ё,. The variances o? = ¥ (n) 
of these can be shown, by a method similar to that used for the n;; above, to be such that 
lim o2n— is some finite value independent of ». The theorem proves that for an ergodic 


п о 
chain of the type we consider, lim o?n-* cannot b be zero. 


no 
quencies n;; we redefine our system of 


states in the following fashion, assuming first that he pi; are all non-zero. We define a 
system of s? states Е, (i,j = 1,2, ..., 4), in which the system will be in state Z,; when there 
is a transition from state Е; to state E, of the original system; the new stochastic matrix 


for this system will be 


In order to apply this result to our transition fre 


(MeL re Ое E 
0 [UR apse Pa ООЛАК" 
0 Ае: AM ves 0. Pa- Pas 


348 Macimum-likelihood estimator of an unknown parameter in a Markov chain 
If the original stochastic matrix is written as 
1 
р = (0,2 ...&,) = (). 


where a, B; are respectively column and row vectors, then ү can be written 


e cu 


ITE 


| | 


It is clear that n; will now indicate the number of times in a realization of the chain that 
the system is in the stat + у; in other words, the n, in the new system are the analogues of. 
the n; in the original в . All that remains to be proved is that if the original system is 
ergodic, so also will the new system; this is intuitively obvious, and can be easily shown by 


powering the matrix ү as follows. If we write for the nth power of the matrix p 
р" = (a af? Ру a”), 
it is seen directly that on multiplying the matrix ү by itself, we obtain 


kid 


А 91 [^ ay 9, А 
у= |.|, |... 1 [pal i] m9 i] Pal i 0| ||, 
0 «| 92 Qe, 9, =, 
4 З Р, 
and similarly, since for n-> oo, lim af? = [ : |, 
no Р, 
1 


It follows therefore by Fréchet’s thedtem that lim o7, n10. If a certain p; is zero, 


ъ->© 
then in our new system the state Е; mulie eliminated, since no transition into or from this 


state is possible. It can be verified, however, that the results above will hold equally well 
in such a case. We now conclude from Tchebyshev’s theorem that the variates 7,17 
converge in probability to their expectations since 
Pr{| nn — £p | 2 €) < o;[n*e? < A[ne?. 

We may express the joint convergence in probability of By, B,, В, to their expectations 

in the form E 
Pr{| Bo | «8?, B, « — (09); | Ba | < 2| K(0,)|}>1-e, 
where 8, € are any two small positive numbers. Let 7 denote the set of points for which 
this inequality holds; then for every point in F, we have that 
DB, 449? < (1+ | K(0,) |) 2° 

and B,8 < — (б) ê. 


aj? [d PrP, «+ ps 
then ҮТ ж. pal П i Nee ui ; |. 
aj & aj? uA ++ Pafs 


J. Gant 349 
If 0 = 0, + ô are two such points іп F, then from (15) we have for these points 
141, 
= lie Bo x ôB, + M*B,. 
Provided that 8 is so defined that 
ô< 41005) (1 + | K(6,) |), 
then for 0 = 6,4-8, 
1dL к 
n do < (+ | KA) 1) 310) (1+ | K(0)|) 3} 8 < 0, 
whereas for 0 = б/—6, ) 


1P (1+ | K(0,) | (9-809 1 + |. K(O,) |-2)25 о. 


We see from (6) that dZ/d@ is a continuous function of 0 іп R, and it follows therefore that 
the likelihood equation (7) has a root in б + ô with probability „ре to 1. This establishes 
the consistency of the estimator Т' of б. ' 


(2) Asymptotic normality of maximum-likelihood estimator 
Let the consistent solution of the likelihood equation be 7; then from equation (15), 
we have that, * 


Tal) ца NAE 
ten] a ennt, no ^ 
or By+ (T —6,) B, акт“ ~ 6,)* By = 0, 
so that (T —6,) = ВВ, + 3(T —6,) Ba}. 


We have seen that the expectation of By is zero; its variance can be evaluated, again using 
the method described in Fréchet (1952, p. 73): 


¥ (Bo) =, а), | 


AE zü р)" (18) 


We treat each n; as the sum of the n variates which take the values 1 or 0 at each of 
the п transitions r = 1, ..., n, depending on whether these transitions are or are not from the 
State E, to the state Е. 
We then have 
2 
"аши, à | ( xy “eal 
be P dt E vs p, W Xap хр, ў 
у “ы, 9 г 4 dp; Se m ا‎ 

E UE "PETI 10 n Pm d0 
au i ‚у, X Xi P hl 

40 ee zs pu Pw dO dO 

«(9 Xin Хүр, dpi ze). 

D» JE Pin dû 


ij Pig d0 


т 


1 
-M 


т< 


р Pi; 


+2 = | 


r«t li,jlm 


. 


350 Maximum-likelihood estimator of an unknown parameter in a Markov chain 


where X, Y, X indicate summation over all possible valuesr = 1, ...,n;%,j and/,m = 1,...,8, 
r bilm 


but where У, У, indicate summation over those values of i, j апа l,m = 1, ...,s, for which 
ij bm 


at least one of the conditions i +l, j + m holds. Since for these values of i, j and /, m, we have 


that X9 Xi» = 0, and further X == = 0 for all /, then this is equal to 


Ху (dp 3 dp; dpy, Р, (dp, y? 
АЕ noe e #9) ent o 
с Pr 40 >, EE Рул" Рм) ijPim @0 40 e fu dé 
It follows that ¥ (By) = —n36(B,) =n), 


Bo 


(T—6) Vilba) n} = С ЕТ 


iO) 240) 


so that 


‚ has zero expectation afd unit variance, for the denominator converges to 1 in probability, 
and the numerator has zero expectation and unit variance. If we can further show that Ву 
is asymptotically normally distributed, then 7' will be asymptotically normally distributed 
with expectation 0, and variance {i(0,) n}. 

Now from (16) we see that B, is a linear function of the n;;, so that it is sufficient to prove 
that as л—> со, these are jointly normally distributed. This has been proved, for the ergodic 
chains considered, by Bartlett (1951), who in his work also implied some of the properties 
of maximum-likelihood estimators developed in detail in this paper. The proof is briefly 
outlined below; one difference, however, is that it is no longer necessary to postulate 
In p4(t) + 0, since it was shown in $ ПТ that for the chains considered, this must be so for 
some t. 

Following Montroll (1947) and Bartlett (1951), we define the moment generating function 


— ——————————————— ee NMNIDDDO 


of the ж, as M(t) = &(exp » tyny) 
= ETET 

If R is the square matrix whose transpose R' is defined by (10), this is 
M(t) = 1'R"P, 


where 1’ denotes the row vector (1,..., 1), and P is the column vector of the stationary 
probabilities P. If R has the s latent roots x(t), ..., /£,(t), we have seen that for small t, 
IA (t) is the single dominant root such that (0) = 1. Following Frazer, Duncan & Collar 
(1947, 84-15), we have that Sylvester's theorem gives for n->0o, ‘ | 
R” ^ {p (t)}” 20041), 
where the matrix Ж(д,) is finite. It follows that 
M(t) {д(©)}"1'2(д,) P, 
so that In M(t) ~ nln (t), 


a non-zero value, since for an ergodic chain, for some small t it has been shown that (t) + 1- 


J. Gant 351 


M(t) is thus asymptotically equivalent to the moment generating function of a sum of 

n identical independent vector variates. Since for an ergodic chain, the n; have variances 

of order n so that o3,n™ are finite and non-zero, they will tend to simultaneous normality 

as n— co by the Central Limit Theorem for n = E" variates. It follows that B, will be 
j 


asymptotically normally distributed, 

Finally, a general convergence theorem of Cramér (1946) enables us to conclude that 
(T —0,) {ilon} is asymptotically normally distributed with zero expectation and unit 
variance; this is that if a variate £, with distribution function f, (x) tending to f(z) and a 
variate у, converging in probability to a positive constant с as n> oc exist, then the dis- 
tribution function of £, 3j; tends to f(cx). 


"We (in)? = Я (@),/ 


is the minimum possible variance, we see clearly that the consistent solution of the maxi- 
mum-likelihood equation is fully efficient. P 
A 
IV. SUFFICIENCY CONDITIONS 
A further general theorem of maximum-likelihood estimation, proved under certain con- 
ditions by Cramér (1946, § 33-2) for continuous distributions, applies equally with minor 
modifications to the case of the Markov chain. It is that: 

Ifa sufficient estimator Т of the parameter б exists, any solution of the likelihood equation 
will be a function of Т. In obtaining the maximum-likelihood estimator 7 of 0, the para- 
meter defining the transition probabilities p,,(0) of a Markov chain, it may therefore be of 
interest to determine the form of the transition probabilities which admit a sufficient 
statistic for 0. 

Koopman (1936) and Pitman (1936) have given the general form of continuous distribu- 
tions admitting a sufficient statistic, but although it has been obvious that a similar form 
exists for discrete distributions, no account of the proof for this appears to be available. 
We briefly outline the proof, use it to obtain the form of the probabilities р1(0), --- pi(0), 
in the particular case of the multinomial distribution, and finally generalize our results in 


the case of the simple ergodic Markov chain. 


(1) The form of the likelihood function admitting а sufficient estimator TT of 0, 
for discrete parent distributions with constant variate intervals 


We consider those discrete distributions p(v, 0) for which the variate v is defined at equal 


intervals. We can then, without loss of generality, assume the interval to be unity = wr 
x takes consecutive integral values in any specific range, finite or infinite; this к es the 
standard discrete distributions for which a is a frequency, as well as others for which x may 


бак iti tive integral values. PE 
а тй ions of a variate x with the discrete distribution 


Let 2,,...,2, be” independent observat : ; x 
p(x, Ө), x » takes a set of consecutive integral values in a given range. The probability 


of this sample is glena ni 0) = р(х) ve Pl tips 0). 


If this is to admit a sufficient estimator T of 0, then the factorizability condition 


P(t; es Ens 0) = f(ty ... %y) F(T,0) 


352 Maximum-likelihood estimator of an unknown parameter in a Markov chain 
must hold, or for the likelihood function L = Ing, 
L = nf(z, ...,2,)--In F(T, 0). (19) 


Assuming that the functions p(z;, 0), F(T, 0) are differentiable with respect to 0, we obtain 
the equation 


2 д 

gg (In ple 4)} = 5; {ln Р(Т,б)} = 6(Т,б). (20) 
Since this holds for all values of 0 within the range allowed, a particular value 0, lying in 
this range will give, when substituted in this equation, 

u= Zul) = g(T), (21) 
which connects 7' with the statistic 

ð 
u= Zue) = (бана, 0)) _. 
i mo 0-0, 


Now since sufficiencyis а property which holds for all values of the sample z, ..., Xp, we may 
allow z;, a particular discrete sample value, to increase by 1, while the remaining Ly, 5, ..., 
Vii X41) ..., 2,, remain unchanged. We can then difference equation (21) with respect to 


оош Дач = Аиа) = А„0(Т). 
Similarly for equation (20) 


д 
Ar, gon n. 0) = AAG(T,0), 


80 that for all values of i, we have the equation 


д 
Anut) А„д(Т) : 
Now if G(T, 0) and g(T) are assumed differentiable with respect to Т, we have that 
A4G(T,0) = Q(T (ay, ...,2; tit l, titi ...,2,),0) — Q(T (ay, ...,2,, ...,,,), 0) 


да 
асар 
1 0) БУШ n 
0G 

= (Ax T) Ж ; 
where T, is some function 7,(2;, sU; 94; 0) of the sample values and of 0 such that 
Ty & T, « T,. Further, since the estimator T' is a symmetrical function of the x;, the function 
Т» will be the same in the case of every differencing with respect to each Xj, i= 1,...,% 
Similarly, 0g 

AST) = (АТ A 
a2) = (А1) (s Таб, ET Oo) 

where To S Т2, ..., 25 ...,2,; 09) ST. 
It follows that equation (22) can be written for all values of i as 


д @ 
As aln pz 0) el. — 
Aux) —— ED : 
aT Таб, «++, 2m 09) 


J. Gaxt 353 
which is equal to a fixed value, not depending on i. Hence 


д 2 
A fpes) А, ар) —— Agagln pty 0) 
А„щ(х,) SGT ER Ата) C 
E зет only be so if these аге all equal to a function of 0 only, K,(0). It follows from 
22) tha 
so that G(T,0) = K,(0)g(T) - K(0). 
On integrating this with respect to 0, we obtain 
In F(T, 0) = A,(0)g(T) + A40) +AT), 
so that the likelihood (19) is of the form 
L = ауаз...) + AYO) OT) +A), 1 (23) 
and the probability of the sample is of the form Р 
P(N э Vy 6) = f(t, Jr) х.) ехр [А,(0) g(T) t A«(0)]. 
(2) Form of the probabilities р{(Ө) for a multinomial distribution 
admitting a sufficient estimator T of 0 
Consider the Е mutually exclusive events Ё,,Ё„...., E, with non-zero probabilities 
k 
pi(0), ..., py (8) such that > р) = 1. Tn a total of n independent trials in which there are 
1=1 
x, occurrences of the events E, ta of the event Zp, ..., and £y of the event E, the probability 


of the sample of frequencies ty, Vas «Tr, 18 
Plitis «++ Tk; 0) = pi (0) ... F(0). 


For this to admit a sufficient estimator, the likelihood L = In ф must be of the form (28), 


so that т 
L- à a, In p;(0) = Mf (Er <) + A40) (Т) + A«(0). 


k 
We notein this case that A,(6) involves the numberof trialsn = P so that it can be written 


k 
л) = Х 21240) 


where A;(0) is a function of 0 only. Also from the form of the likelihood, we see that the first 
term can only be of the form 


k 
Inf (a, «++» tr) = mro 


function admitting a sufficient estimator T of Ó as 
(24) 


We can therefore write the likelihood 
L = Xznp((0) = 4,2, + Ml) #(Т) - A«(0) PL 
í 


„ж. If, in a total ofn+1 


which holds for all values of л, and of the frequency values 25, .. 
., E, apart from the 


independent trials, the frequency of each event Ey, Ё„,..., 1 Ен e 


354 Maximum-likelihood estimator of an unknown parameter in a Markov chain 


event E; remains unchanged, while the event E, occurs v;--l times, then we may 
difference equation (24) with respect to #,, to obtain 


In p((0) = A;4- A,(0) A 9T) 4- A«(0), 
or (A1(0))7 (In p,(0) — 4; — А„(Ө)} = A,,9(T). 
That is, a function of 0 only equals a function of the sample values z;, which is possible only 
if both equal a constant K;. 
It follows that 
P:(0) = exp [K; A,(0) + ۸(0) + А] 
= а;ехр[К,Л,(0)+А,(0)] (i= 1,...,k), (25) 


where A; = ln &;, is the form of the probability allowing a sufficient estimator T of 0. 
A condition to which the probabilities р,(0) are subject is 


к 
2 200) еа. Le 
so that in our case, from (25) 
Z a exp [K;A,(0)  A4(0)] = 1, 
or Xe exp[K,A,(9)] = exp[ —A(0)]. 
An alternative way of writing the probabilities P:(0) is therefore 
(9) = a, exp [K, A,(0)] (Y, a, exp [K; A,(6)])-! 
t 
= ATA) {X a; AKA), (26) 

t 
where A,(9) = exp [A,(9)]. 

We can now tell immediately in any practical case whether it is possible to obtain a suffi- 
cient estimator or not by an examination of the form of the multinomial probabilities. Three 
simple illustrations from trinomial distributions are given as examples. 

Example ТҮ 2-1. The trinomial distribution with probabilities 

Pı = 0(0--0*--203)1, р, = 040 +62 + 263)-1, Ds = 209(0 + 02 + 29311, 


of the form (26) with о; equal to 1, 1,2 respectively and К, equal to 1, 2, 3, where A,(0) = 0, 
will give an estimator of 0 which is sufficient. 
Example IV. 2:2. Probabilities of the type 
Pı =0, р, = 20, Ps = 1—30, 
which can be written in the form (25) 
Pi = e exp[K; In 0(1 — 30) + In (1—36)], 


where c; equal 1,2,1, and К, equal 1, 1,0, respectively, will also give an estimator of 0 
which is sufficient. 


Example YV.2-3. However, probabilities of the form 
5 = }(2+0), р, = 1—0), р, = 40, 


which occur іп genetics, cannot be written in the forms (25) or (26), and can be verified to 
admit no sufficient estimator of 0. 


J. GANI 355 


(3) Form of the transition probabilities p,,(0) for a simple ergodic 
Markov chain admitting a sufficient estimator T of 0 
Consider the realization of the simple ergodic Markov chain with s possible states E,, ..., E, 
which results in the observed sequence S of the n+1 states E,, E; ..., Ex En in this order. 
Assuming the initial state E, to be fixed,* the probability distribution of the sequence Sis 
given by (2), and the likelihood can be written as equation (3). An ergodic chain (see 
footnote, $ IIT) is such that for any row i of the transition matrix, and for all values of j, 


0 < Pyl) «1, 


so that, in particular, for i = j no state Е, is an absorption state; some, though not all, 
p,(0) in а row may equal zero, subject to the usual conditions (1) that 


z pu(0)-1 (= 1,2,...,8). 
j=1 


Nowif the chain admits a sufficient estimator T' of 0, we see that it may be written, much 
in the same way as for the multinomial case, 


L- Z nln pyl) = Z Agny t M g(T)+ A,(9) Eu (27) 


where the А,, are constants, Л.(0) is equal to the function A40) Улу in this case, and 


Ут; = n. 
P Before proceeding, as in the similar case of the multinomial, to increase the number of 
trials in the realization of the chain by one, so as to allow this equation to be differenced 
with respect to the n; in order to obtain the non-zero values of the p;;(9), we must consider 
some difficulties which this method presents. Since to any zero transition probability руб) 
there will necessarily correspond a zero value of the transition frequency nij, neither will 
appear in the likelihood function (27). We are concerned with the n,; corresponding to non- 
zero values of p;;(0), and differencing occurs only with respect to these; however, owing to 
the set of relations (8) for the sums of transition frequencies, it is not always possible in 
increasing the number of trials in a realization of the chain by 1, from n+l to n+2, to 
increase each of the ni; associated with a non-zero transition probability p;;(0) singly by 1, 
while leaving the values of the remaining nij unchanged. e 

As а simple example to illustrate this point, we consider the case of à chain wit ы 
states E, and Es, for which a possible realization of ; three trials, starting with e fixe 
state E,, results in the sequence HE, E, Ез. Assuming all the transition probabilities to 


* We are specifically considering the case of sufficiency for a distribution conditional Ат the given 
initial state Z,. In the general case where any state E, may occur initially, the likelihood is 


8 
L= Ў «n жс ые 
i=1 * spè n 
where the variates z, are 0 or 1 according to whether the state а is Ed initial state. In addi- 

tion to the usual conditions for the эу, the v, are subject to the condi 


8 
Xcz-l 
i-1 


This more difficult case will not be considered. 


356 Maximum-likelihood estimator of an unknown parameter in a Markov е 
have non-zero values, we see immediately that the transition frequency matrix 


elements n,, is 
ents л, ar E ms) _ jJ 
Noy о 0 


a realization of four trials starting with E, will give, among others, the following transiti 


frequency matrices: 2 1 11 11 

n i) ( 0). (б i) 
for the sequences Ё,, Ё,, Bı, Ea; E,,E,, E,, E,; and E, Ep Ea Ea Here we see that 
Noy Nga have singly been increased by 1, while the remaining n;; remain constant; ii 
however, in no way possible, starting with the given matrix n, to increase 7,, by 1, 
leaving Жү, nar, ng unchanged. 

In order to resolve this difficulty, we use the fact that for an ergodic chain, providing th 
number of trials n + 1 is sufficiently large in a realization starting with the given state 
it is possible to end with any one of the states E, ..., E, In our particular case, assun 
a sufficiently large number of trials, the realization of the chain leads to the seque 
which starts with the fixed state E; and ends with the state Ej; a further trial may re 
the sequence S’ with an additional state E,, for all r for which p, (6) + 0. The probability 


this sequence is r 
i: $8") = pisl) ... ры(бӨ) рь(б), 

and the likelihood function L’ = n¢(S') is identical with L = 1пф(8) except that th 

transition frequency n, in it is increased to л, +1. This allows us to difference tl 

equation (27) with respect to m, for all r for which p,,+ 0, in order to obtain i 
In p,(0) = A, 4 A,(0) А „ g(T) - (8), 

where А, g(T) is some function of the transition frequencies та. Now since 
(A46) (In p, — A46) — Ay} = А„ 9(T), 


that is, a function of 0 equals a function of the »,;, then both must equal a constant К, 
This allows us to write for all r for which р„:Е0, 


20) = os, exp [КА (Ө) +A9(4)], 


where A, = In ду. If we further admit zero values of ;,, we may accept this equation as th 
general form for both zero and non-zero values of p, (r = 1, 2, ...,8). Since, for a sufficie 
large number of trials in a realization of the chain starting with the given state ,, the 
state E, may be such that / can take any of the values 1, 2, ..., s, it follows that we may WE 
as the general form of the transition probabilities р:;(0) admitting a sufficient estimate 


T of Ө, the equation 
ч 2100) = a, exp [К,А (0) + A4(0)]. 4 
This general form of the p,;(0) has been obtained for sufficiently large values of n = 5 1 


this means that in this case the necessary condition that a sufficient estimator of 0 exis 
is that the p,,(0) be of form (28). Itis easily verified that the sufficient condition also follo w 
that if the p,,(0) are of the form (28), the estimator of 0 is sufficient. This, however, holc 
generally, irrespective of the size of n. It is of interest to note that if we restrict ours elv 
to the case where all the p,,(@) are non-zero, all values of n> 1 are sufficiently large for th 


J. Gast 357 


necessary condition to hold, since any final state E, may be reached in а realization of one 
or more trials starting with the given state E, This means that for the ergodic chain with 
non-zero p,,(9), a sufficient estimator of 0 exists for any number of trials я » 1, if and only 
if the р,,(0) are of the form (28). 


The transition probabilities (28) are subject to the condition (1) that Y, py = 1, so that 
j»li 
S оаехр1КАд®)] = exp [= А40), 
ج‎ È ayexp[KyA,(0)) = X a, exp[K,A40)). (фт = 1,2,....2) (29) 


If we write A,(0) = exp [A,(0)], we see that the py(0) can also be put in the form 
руб) = ay AFO) (А0), (30) 


where (X a; АК(0)}-? has the same value for all rows i = 1,2, ..., s. 
j 


Two possible cases now exist for those exponents among Кз, K s, ..., Kin associated with 
the non-zero probabilities among Pir Р, ---, Pip in the ith row of the transition probability 
matrix p; the first in which all such exponents are distinct, and the second in which only 
some such exponents are distinct. For simplicity, we assume that in the ith row, all tran- 
sition probabilities p,,(0) (j = 1,2, ..., ) are non-zero; the slightly more general case where 
some p;,(0) may be zero follows without difficulty. 

Consider the first case, where for the particular row i there are s exponents K, all distinct, 
associated with the s non-zero transition probabilities р,(0) (j = 1,2, ...,4); then, from 
equation (29), it follows that for any other row r, 


Saf“ = Буд. (31) 


This is possible only if the exponents K,, and K,,; (j = 1,2,...,з) have the same ғ distinct 
values, though these may be arranged in different orders. This means that there is only 
a single set of s distinct values 


аА {Ж ag AP э au APA a AY ex 


for the transition probabilities of the matrix p, and these must appear, possibly in different 
orders, in every row of the matrix. j : 
Two simple examples for Markov chains with two and three states respectively will 
illustrate the previous points: 
Example ТУ. 3-1. The transition probability matrix of the form 


6 l- 
P= M 8 
will always provide a sufficient estimator of 0, for we may write p;,(9)(j = 1, 2) in the form 
Ga руб) = exp [Ky In (1-0) + In (1-0), 


where the values of t,1, 25 and Ку, Ky, are 1, Land 1, Orespectively. The p row consists 
of the same transition probabilities in a different order. os 


358 Maximum-likelihood estimator of an unknown parameter in a Markov chain 


Example IV. 3:2. 
20(20--0*--03)33 *(20--0?--03)-3 63(20+ 67 +68) 
p= (rers: 20(20 + 6? + 68) menato) 
02(20 + 0 + 03)21 03)20 +0 + 03)21  20(20--0*--0*)c1 


also provides a sufficient estimator of 0, for in (30) we have A,(@) = 0, and the values of 
a, and Kj, are 2, 1, 1 and 1, 2, 3 respectively. The second and third rows consist of the same 
values for the transition probabilities, but in different orders. 

Consider now the second case for the exponents. Here, for the particular row i, the 
s exponents Кү, are not all distinct; let the first k exponents (k < s) be identical 


Eq Ka 2 = Ka mE 


and the remaining К, ,.,, ..., K;, be distinct, so that there are altogether s — k+ 1 distinct 
values for the exponents, and the transition probabilities for row i are therefore 


8 T 
pi) = ay x d (j = 1,2,...,k), 
es (32) 


8 = 
pi) = ayati $ ayat) (j= k+1,...,8). 


(The slightly more general case of several groups of exponents with identical values, and 
the remaining exponents distinct 


Ka = Kia =... = Kik = Kk, 


Кока = Kira =... = Kin = КӘ, 


Kiut К qa +... TÉ 


presents no difficulty different from those met in the simpler case mentioned, and will not 
be considered.) Now since for any row r equation (31) holds, there are = — k+ 1 distinct 
values of K,; which are identical with the distinct values of the K;;, so that in every row of 
the stochastie matrix p we have, apart from coefficients, transition probabilities of only 
the following s— k + 1 distinct forms 


{Жад л, Аана, s Aaya, (35) 
j j 


appearing in various arrangements. The s coefficients associated with these s-k +1 
distinct forms for the transition probabilities may also differ from row to row, but are sub- 
ject to the condition which follows from (31) that for all rows r = 1, 2, ..., 8, the sums of the 
coefficients for each distinct form in (33) must equal the fixed values 
k 
У Mig, Фк, ees б (34) 
respectively. 1: 
Two simple examples of this case for Markov chains with three states follow. 
Example ТУ. 3-3. 
20(30--03)- — 0(30--09)3  0*(304-03)-1 
p- | 10*30--0))3 30%(30-+6%)-1 aan.) 
110(30 + 03)-1 — 0*(30--69)- 4.0(30 4-0) 


J. Gant 359 
where in equation (32), A,(0) = 0, and the two distinct forms for the transition probabilities 
appearing in each row are 6(30 + 0)1, 0304.0) 

The values of the coefficients for the form 0(30 4-63)! vary from row to row, subject to the 


condition (34) that their sum is always equal to 3, while those for the form 030 + 6°) have 


а sum always equal to 1. 
Example IV. 84. 
1-6 gy 
т" 
41-0) 410-6) Ө 


where in equation (32), A,(0) = 0(1 —60)-!, and the two distinct forms for the transition 
probabilities appearing in each row are 


(001—0) (1—0), Ky 0,1. 


Again the values of the coefficients for 0 vary from row to row, subject to condition (34) that 
their sum is always equal to 1, while those for (1—@) also have a sum equal to 1. 


I am greatly indebted to Prof. P. A. P. Moran for suggesting the field of parameter estima- 
tion in simple Markov chains as a subject for research, and for his useful critical comments 
throughout all stages of the work. I should also like to thank the referee for several helpful 


suggestions. 
REFERENCES 


Barrett, M. S. (1951). The frequency goodness of fit test for probability chains. Proc. Camb. Phil. 
Soc. 47, 86-95. ; 

CnaAMÉR, Н. (1946). Mathematical Methods of Statistics. Princeton University Press. / 

Frazer, R. A., Duncan, W. J. & COLLAR, А. К. (1947). Elementary Matrices. New York: Macmillan. 

Fricuer, M. (1952). Traité du Calcul des Probabilités et de ses Applications, tome 2, fasc. ш. Gauthier- 
Villars. | fine 

Huzurpazar, V. S. (1948). The likelihood equation, consistency and maxima of the likelihood 
funetion. Ann. Eugen., Lond., 14, 185-200. Y 

Koopman, В. О. (1936). On distributions admitting а sufficient statistic. Trans. Amer. Math. Soc. 
39, 399-409. к 4 

Moxrrorr, Е. W. (1947). On the theory of Markoff chains. Ann. Math. Statist. 18, (on yes 

Prrman, E. J. С. (1936). Sufficient statistics and intrinsic accuracy. Proc. Camb: Phil. з, - 

Rao, C. R. (1952). Advanced Statistical Methods in Biometric Research. New York: John Wiley. 


Biom. 42 
24 


[ 360 ] 


SIGNIFICANCE TESTS FOR DISCRIMINANT FUNCTIONS AND 
LINEAR FUNCTIONAL RELATIONSHIPS 


Bv E. J. WILLIAMS 
Division of Mathematical Statistics, C.S.I.R.O., Melbourne 


I. INTRODUCTION 


| е 
In previous papers (Bartlett, 1951; Williams, 1952а) certain exact tests for the adequacy 


of a hypothetical discriminant function were derived. Later papers (Williams, 19520, 
1952c, 1953) showed how these tests could be applied in a number of situations of practical 
usefulness. The first object of the present paper is to extend the work in the above-men- 
tioned papers and to show how the results obtained may be interpreted in terms of multiple 
linear regression. The calculations may indeed be carried out in the manner of a covariance 
analysis. The second object is to develop, along the same lines, exact tests for an assumed 
linear relationship among variables; this is a problem which has been discussed in various 
contexts by Koopmans (1937), Tintner (1945, 1946, 1950), Geary (1948, 1949), Bartlett 
(1948), Anderson (1951) and others. Since the question of determining underlying relation- 
ships has been given considerable attention in the literature from a number of different 
points of view, the opportunity is taken also to discuss and to attempt to unify the different 
approaches made—the use of information provided by instrumental variates, by grouping 
of the data, and by higher moments. 

The reason for discussing discriminant functions and functional relationships in the 
same paper is because the two problems are really different aspects of the same problem. 
This has been well demonstrated by Geary (1948). If a single discriminant function is 
assumed adequate to describe differences among a number of p-variate populations, this 
assumption is equivalent to assuming that there exist p — 1 linear relations among the means 
for the p variates; the means then lie on a line. In general, postulating that the differences 
among the populations are described by r discriminant functions is equivalent to postulating 
р—т linear relationships (provided always that the number of populations considered is 
not less than p). The quantity 7, the number of dimensions in which the population means 
lie, may be called the rank of the populations, and p—r the degeneracy. 

Thus the test for a single linear relationship is equivalent to the test for the adequacy of 
р — 1 discriminant functions. In deriving significance tests for either a discriminant function 
or a linear relationship, the same principle is applied, though the function being tested 
has a different role in the two cases, and thus enters differently into the tests. In the simple 
bivariate case, the test for a linear relationship is exactly the same as the test for the 
discriminant function which is orthogonal to it. 

The problems of this paper have been framed above in terms of an analysis of variance 
model, for testing the significance of differences between populations. This has been done 
in order to link them with those discussed in the earlier work (Williams, 1952 а,Ь). A more 
general specification is in terms of a regression model, wherein the interrelationships betwee? 
a set of p variates and another set of q variates are investigated, In such a model a dis- 
criminant function is better described as a canonical variate. Throughout the remainder of 


E. J. Wnauwp 361 


this paper the problems will be posed in terms of the regression model but interpreted also 
in terms of analysis of variance. Though both specifications are formally equivalent it is 
easier sometimes to work with one and sometimes with the other. 

Bartlett (1951) has shown that the different aspects of the adequacy of a discriminant 
function may be tested by means ofa factorization of a certain determinantal ratio. A general 
calculus of such factorizations is developed, and it is shown how it can be applied in a 
number of the tests described in this paper. 

The method of treatment used earlier (Williams, 1952a) in deriving exact significance tests 
was to frame the questions asked of the data somewhat differently from those usually posed. 

„A hypothetical discriminant function being proposed, the question is whether this function 
is concordant with the data. Since the hypothetical discriminant function leads to a sufficient 
statistic for the unknown population parameter, tests can be derived which are independent 
of the value of the parameter. These tests are the counterparts of those based on the latent 
roots (largest or smallest) of a matrix, but have the advantage of being exact, even in small 
samples. The same approach is adopted in this paper, in deriving tests both for discriminant 
functions and functional relationships. 

The most important practical results given in the present paper are that the tests pre- 
viously given for the adequacy of a hypothetical discriminant function, and the tests now 
given for a functional relationship, can be reduced to tests derived by a covariance analysis 
of the data. The computations are thereby simplified because it is not necessary to deter- 
mine the latent roots of a matrix equation, the test functions being expressed either as 
ordinary adjusted sums of squares in a covariance analysis, or as the ratios of determinants 
of sums of squares and products. If, however, the ‘best’ discriminant function yielded by 
the sample (or the ‘best’ linear relationship) is to be evaluated, it is still necessary to 
evaluate the largest (smallest) latent root and the corresponding latent vector. 


II. NOTATION, TERMINOLOGY AND PRELIMINARY RESULTS 


Since this paper covers a rather wide field it has been thought desirable to set out clearly 
the notation to be used throughout. While an attempt has been made to use notation in 
conformity with previous work on the subject, some departures have been made in the 
interests of clarity. 

We consider two sets of variates, 


X; @=1,2,...,0); 
Ү, (7 = 12,50), 


measured on each of n + 1 individuals. The X; will be assumed to have a non-singular joint 
normal distribution, with linear regression on the У. Formally, this is equivalent to q T 1 
populations with the same p-variate normal distribution of values about the means in 
each, so that the variation in the sample may be analysed into the q degrees of freedom 
т 9 within groups. , 1 
E dili is to днде from a set of data, either that linear function 
or set of linear functions of the X, which have greatest correlation with the Y,, or the adequacy 
of a set of given discriminant functions. On the other hand, the problem of determining 


functional relationshipsis to find the set of linear functions, if any, of the X; which are 


i ions for their correlation with 
uncorrelated with the Yj, or to test а given set of such functions for the " 


the F,. Linear functions which are uncorrelated with the Y,, either by hypothesis or in 


362 Tests for discriminant functions and linear functional relationships | 
the 


sample, will be called null functions, and their sample values null variates. 

Although from the formal correlation point of view there is a symmetry between the 
X, and the Ү,, the tests of significance developed here are always of a function (either а 
discriminant function or a null function) of the variables of one of the sets (in this paper 
the X,), and so are not symmetrical in the two sets. In this respect the present approach 
differs from that based on the canonical correlations alone, in which the magnitudes of the 
canonical correlations are used to decide the number of discriminant functions (or, in other 
words, the rank and degeneracy of the population correlations). Reiersol (1945) emphasizes 
the distinction between the two sets of variates by designating the Y, ‘instrumental’ 
variates to distinguish them from the ‘investigational’ variates X;. 

The canonical variates of the two sets will be denoted by z;, y; and the corresponding 
squared canonical correlations by 0;. The hypothetical discriminant function will be denoted 
by £, and its squared multiple correlation with the У,, or discriminant ratio, by A. 

The tests of significance for a discriminant function have been shown in earlier papers 
to depend only on the original discriminant ratios 0; and the new discriminant ratios after 
the effect of £ has been eliminated by covariance, which will be denoted by ¢;. 

It will be shown similarly in this paper that the tests for a linear functional relationship, 
that is, for a given null function, depend on sets of original and new latent roots; the new 
latent roots which will be denoted by 0; are, however, defined differently from those used 
in discriminant analysis. 

The following notation will be used: 

Sums of squares and products: 


tıı Sum of products of X, and X, 

pi; sum of products of X; and Y, 

uy, sum of products of Y; and Y, 

b,; sum of products of X, and X, for regression or between-group line of 
analysis of variance 

шы sum of products of X, and X; for residuals from regresssion or within 


groups, ې‎ 
Also t; sum of products of апа X; 
tg sum of squares of £ à 


and similarly for other sums of squares and products with the suffix £. 
The corresponding matrices will be denoted by capitals. Then the following results are 
readily obtained, relating the different expressions occurring in the tests of significance: 


B+W =T, 
А B= PUP’, | 2|/| T | = 10; 
W -T-PUAP', |W|/|T| = (1-6). 
The matrix of sums of squares and products of all the p +q variates will be denoted by 5: 
Dee | 
Armed with these results, we are in a position to set out expressions in terms either of the 
regression model or the analysis of variance model. 


| 


E. J. WILLIAMS 363 


In the derivation of testa of significance by means of covariance analysis, use is made of 
sums of squares and products of the X, after adjustment by certain covariance variates. 
Since there seems to be no consistent terminology for these quantities, we propose to use 
the term ‘adjusted’ only for quantities which are adjusted without any lows of degrees of 
freedom, and the term ‘reduced’ for quantities for which the degrees of freedom are reduced 
by the number of covariance variates. Thus, the analysis of variance of any X, would give 
the following partition of degrees of freedom: 


Difference of regressions on covariance variates 
Regression, reduced 
OU A а ы ышы шие шшш, 


Regression, adjusted 


III. НЕ FACTORIZATION OF DETERMINANTAL RATIOS 


The overall likelihood criterion for testing the rank of the population from which a sample 


has been drawn is, in s of the sample latent roots, equal to 


, П (1-0). 


e del 


364 Tests for discriminant functions and linear functional relationships 


In terms of determinants of sums of squares and products it may also be expressed as 
[7 |/12| = |T—PU-P'|T | 
[S| 
= ттүү, 1 

TTT 
and so may be represented as a determinantal ratio without calculation of the latent roots. 
The criterion may be expressed as the ratio of two determinants of order p, with п— and 
n degrees of freedom respectively; and since the X; and the Y, enter symmetrically into the 
expression, it may also be expressed as the ratio of two determinants of order q, with degrees 
of freedom n—p and n. It will accordingly be denoted 


(n; p,q) 
to indicate its dependence on the three sets of degrees of freedom. In partieular, for p — 1, 
the ratio of a residual sum of squares with n — q degrees of freedom to a total sum of squares 
with n degrees of freedom is denoted (р; 1, д). 


Now if s is an integer less than g, a sum of squares with q degrees of freedom can be par- 
titioned into two independent sums of squares with s and g—s degrees of freedom. Corre- 
sponding to this partition, the ratio of sums of squares with n —q and n degrees of freedom 
can be factorized into two independent factors: one with n—s and т degrees of freedom, 
the other with n — д and n — s degrees of freedom; so that 


(n; 1,4) = (n; 1,5) (n—5; 1,4—5). (2) 

In exactly the same way, determinantal ratios may be factorized, giving, for example, 
(n; p,q) = (n; p,8) (n—8; p,q—5), (3) 
or, more generally, = (n; р,8) (0—8; 7,9—8) (n—r—s; p—7,q— s). (4) 


These results are the basis of the factorizations of likelihood criteria given by Bartlett 
(1951), and will be applied repeatedly throughout this paper. 

As in univariate analysis, the first factor in (3) is seen to be the likelihood criterion for 
the simple effect of the s variables, the remaining q — s being ignored, while the second factor 
corresponds to the partial effect of ће g—s variables after the elimination of the first 8. 
The partial factor is accordingly the one which is to be used in significance tests. Just as in 
univariate analysis, there are always two alternative factorizations, depending on which 
set out of the q variables is to be eliminated and which is to be the subject of test. Thus 


(n; p,q) = (n; p,8) (n—5; р,д—в) 
= (n—q +8; р, з) (m; p,q—5). (5) 


When р = 1, the tests for the partial factors may be simultaneously set out in the form of 
an analysis of variance: 


سے 


Effect Degrees of freedom Sum of squares 


Partial s variates 8 Ш (n; 1,8) 
Partial q—5 variates 4—8 1—(n; 1,9—8) 
Residual n—q (n; 1,4) M 
Total n 1 


d 


E. J. WILLIAMS 365 
'The possibility of thus arranging the tests arises from the fact that (for example) 
1— (n; 1,5) = 1—(n—q+s; 1,8) 


(n; 1,4) (n—q +8; 1,8) 


“эзе Кл (6) 
The moments of the likelihood criterion may be determined by means of this factoriza- 
tion. Taking s = 1 and factorizing successively, we have 


(n; p,q) = (n; p, 1) (n—1; p, 1) (n—2; p, 1)... (n—q + L; p, 1). 
It is readily shown that the tth moment about zero of (n; p, 1) is 


T(3(n— p) +] Г) 
Tid p)] Pn +1)” 


во that the moments of the likelihood criterion are given by 


олет Ln 27 3-0 Pi(n 3] 
E{(n; p,q) -H T[3(n—-p-1)] Г — i) +1] a 


Е((®; p, 1}] = 


(cf. Bartlett (1938), equation (26)). ~ iU» 
An important particular case is the limiting one when n tends to infinity. Writing 


(2,9 = eu (n; p.a)", 


we find that the factorization (3) becomes 
(p,q) = (2,5) (5»4—5). (8) 


and it ean be shown that —log (p,q) is distributed as a sum of squares with pq degrees of 
freedom (cf. Williams, 1952a, p. 26). | - 

When a tends to infinity, the two factorizations given above are identical, and are equi- 
valent to a simple partition of a sum of squares—as is, indeed, evident from the analysis 
set out above for the particular case р = 1. { 

An application. As a simple example of the application of these methods we consider the 
case where r of the population latent roots are known to be unity, and the vanishing of 
the remaining population roots is under test. r 

Ifr ola are unity, it follows that Ois Oas +++, 0, are all unity, yim fe 
corresponding canonical variates 21, a, ...,%, are equal (apart from ere changes o 
sign) to 4, y, ..., Yp the canonical variates of the second set. Therefore, in order to examine 


i iati ‚ the p variates of the first set may be replaced by any p—r 
the residual variation of the v; p Uu se gebe 


of them not linearly dependent on the first r of t 1 2 

residual degrees of freedom. For their regression on the у,, e damp "s DDR 
the condition of orthogonality with the first rz; E cos : т. =й Ais 3 a on 
regression has but q—7 degrees of freedom. Accordingly, the residual likelihood criterion 


for testing the existence of the non-unit latent roots, namely, 


П (1-0) (9) 


i=r+1 


366 Tests for discriminant functions and linear functional relationships 


is a (n—r; p—r,q—r) variable. In particular, if r=p— 1 (p&q), the single sample latent 
root 0, may be tested by an F-test: 


1—6, =(п-р+ 15 1,4—p--1), 


___(»-9)% 
P= @=p+1)0-6,)’ 


with q— p + 1 and n—q degrees of freedom. 

Corresponding results hold when q « p. 

These results have practical application where, even if the first r population canonical 
correlations are not unity, the sample latent roots are close to unity. Then the elimination of 
the corresponding sample canonical variates leads to a residual likelihood criterion (9) 
with approximately the distribution given. Such a result has already been discussed by 
Bartlett (1947) and others. 


(10) 


IV. REVIEW AND EXTENSION OF PREVIOUS WORK 


Bartlett (1951) derived tests of significance for the goodness of fit of a single hypothetical 
discriminant function, in terms of its direction, and of departures of the data from col- 
linearity. These results generalized some results of Williams (1952a) for the particular case 
p = 2 (two variates) and q = 2 (two degrees of freedom between groups). However, two 
points were not clarified in this work. First, while Williams used for test functions the 
partial factors, Bartlett suggested that the use of partial factors was not necessary in these 
tests. It should be clear from the discussion in § III that the partial factors are appropriate 
for significance tests. 

Secondly, it was stated by Bartlett that the general test functions could be expressed 
in terms of the original latent roots and the new latent roots after adjustment for the 
hypothetical discriminant function, but the expressions were not given. These results, 
which are given below, show the relationship of the exact tests to the approximate tests 
based on the individual ordered latent roots. Apart from this, they are of more analytic 
than practical interest, since the calculation of all the latent roots, before and after adjust- 
ment, is laborious. Moreover, in $V of this paper, simplified computational procedures 
based on the analysis of covariance will be given, which make possible the ready application 
of the exact tests without the calculation of latent roots. 


Expression of test functions in terms of canonical correlations 


For this section, we express all results in terms of the canonical variates тү, y; and the 
canonical correlations. Let the hypothetical discriminant function which is under test be 


Ё = Утцх, with Ут? = 1, 
so that the corresponding discriminant ratio is 
A = Emi, 


The first factorization of the overall likelihood criterion, representing the elimination of 
the effect of the hypothetical discriminant function, is 


IT (1-4) = (1—A) TT (1-94), 


E. J. WILLIAMS 367 


corresponding to (n; p,q) = (n; 1,4) (n — 1; p— 1,4), and leading by the elimination of the 
factor 1 — A for the hypothetical discriminant function to the residual likelihood criterion 
II (1—9). 

j 


Bartlett (1951) gives two factorizations of the residual likelihood criterion, providing 
tests for simple direction and partial collinearity (equation (7) of his paper) and for partial 
direction and simple collinearity (equation (20) of his paper). In our notation, these factor- 
izations may be written 


па-) = Пп(1-60;/(1-А), 


(n-1;p-1,0) 

ES II(1 —6,) | 
1— 43211 (11) 

ms tA 

"n суз пн (n~ е пе D 
m- UA) 
“lose Зура). (12) 

(1- A)E—— 


Simple collinearity 
(n—ai p—1,1) (n-1;p-1, 4^1) 

'Тһе degrees of freedom for these factors agree with the values given by Bartlett. Since 
the first factor of (12) has unity for one of its degrees of freedom, it may be tested by an 
ordinary F-test. 

It now remains to express the m? in terms of the original and new latent roots. From 
Williams (1952a, p. 20) we have, in the present notation, the equation for the ф, 


SUP Log = E (13) 


On expressing this as an equation of degree p — 1 in фу we have 
фр У mi(1 -6)-95^ Smid —6;) (56,—6;) +... = 


іе. PMLA) - P2612) -А+ Ет... = 
Hence УФ, = Z0, (A-Z mC —A), 
‚ so that a-zvi60/0-A) - E6-X$, 


and the factors (11) become 26.— Уф, АП@-—,) 
= 
Ё, | P 0-Х DIE 


р 
The second factor provides the test for collinearity. It takes its maximum value П (1—0), 


when A = 6,. The use of this product therefore overestimates the significance of departures 


from collinearity. When р = 2 it reduces to 
(1—6,) (1—03) E. 
EP = [1+0/(1-0) 01 —v,)] 1 (notation of Williams, 1952a), 
(1-0 (1=) | [ДА 
1-A 20€ ) Sade 
and is thus a function of the variance ratio previously given for testing collinearity. 


368 Tests for discriminant functions and linear functional relationships 
When q = 2 the second factor of (11) becomes 


а-6)(1—6,) _ 


r 1-Q/P (notation of Williams, 1952a) 
= [1+v,/(1—v,) (1—v2)]7, 


which again corresponds to the test function given in Williams (1952а, p. 26). 
For the alternative factorization it is convenient to write 


Ө = 011-0), 
Ф = ф(1— $), 


А = АҚ1-А). 
Then equation (13) becomes 


cat == 1, 25... p -i 
29,- 9; (7 ) 


so that MENOS s - 


Hence Ут, = > 9,— E Ф, 


and the factors (12) become 


A I (= ры 
Z0.-z9,|/—,.—(6.-z9)|. (12) 
The first factor provides the test for direction; being a (n—q; p— 1, 1) variable it may be 


_%—2—4+1(1—(%—4;р—1,1)) 
gi BG. See BS NUN 


tested by means of an F-test with p — 1 and n — p —q + 1 degrees of freedom: 

When A is close to 0,, so that the hypothetical discriminant function differs little in direction 
from the first canonical variate, the first factor of (12) is approximately A/0,. This provides ' 
a justification of the approximate F-test 


р "Р-9+1(01-А) 
p-1 pU 


When p = 2 the first factor becomes 


0-06) (1—6,) 


(1=A)(1-0,0,/a) - 1 "v 


the test function previously given. 
When 4 = 2 the first factor reduces to 


(1—v,) (1 — vs), 
again equivalent to the test function given previously. 


E. J. WILLIAMS 369 


The two factorizations may be shown in closer relationship by means of the following 
representations, which make use of the fact that (1— $;) = (14-6,)7*: 


AIL 17$) 
15,-E$, 
i b Anas) 


2,0,- X0 
IL 7$) m eu 


V. DERIVATION OF TESTS FOR DIRECTION AND COLLINEARITY 
BY COVARIANCE ANALYSIS 


It has been shown by Bartlett (1939) that a general test for the adequacy of a discriminant 
function is provided by the residual likelihood criterion after the elimination by covariance 
of the effect of the function. A point that has been overlooked, however, is that by a simple 
extension of this method the separate tests for direction and collinearity of the function may 
be derived. Bartlett there deals with the simple case of discrimination between two groups 
(4 = 1). In the treatment below we shall consider the regression model, in the general case 
with q independent (or instrumental) variates. 


(a) The case p = 2 
To introduce the method, we consider first two variates X, and X,, and their correlation 
with the q variates Xj, У, ..., Y,. Let the hypothetical discriminant function be 


ё#=ҺХү+ЬХ,. 


To test the adequacy of é, we consider the residual correlation of the X; with the Y, after 
the elimination of ë by covariance. We may adjust either X,, X, or any linear functio f 
them, for its dependence on £; the same analysis is achieved in any case, since £ is itself a 
linear function of X, and X,. — . 

For the overall test, the sum of squares, with q degrees of freedom, for regression on the 
Y; after adjustment for £, is tested against the reduced residual sum of squares, with n-q— 1 
degrees of freedom. Now the adjusted regression sum of squares may be separated into two 
parts: (i) one with q—1 degrees of freedom, namely, the gum of squares for regression, 
reduced by the elimination of the effect of č, and 5 one with 1 degree of freedom, repre- 
senting the regression of the regression values on §. 

In the Er of variance model with q degrees of freedom between groups, the two 
parts of the adjusted sum of squares between groups are (i) the reduced sum of squares 
between groups, and (ii) the sum of squares for the difference of between-group sod within- 

oup т 10ns. А 
T Then it ases out that the reduced sum of squares ‘a sd ae ү E чм n he 

i j collinearity о: ations in the ana 
the independence of the adjusted X; and Y aps dim à "ry conten Ghee 


of variance model), while the sum of squares Е 4 T 5 
posed diseriminant function. The tests are equivalent to those given in the formal analysis 


of variance presented previously (Williams, 19524, p. а: 


This equivalence may be established as follows. 


370 Tests for discriminant functions and linear functional relationships | 
With the notation given in $1I, we can express the determinantal ratios in terms of 


symmetric functions of the latent roots (discriminant ratios). Thus 
[BIT] = 6,03, 
[71121 = (1—01) (1-63), 

while, by definition, бе|! = А. 


Then the sums of squares for X,, after adjustment by covariance for £, may be expressed 
in terms of determinantal ratios or latent roots as follows: 


Sums of squares in terms of 


Degrees 
of 
wa Determinantal 4 Latent roots 
Difference of regressions (0; — 2) (2—03) 12| 
(direction) i BUT В |р) | рлу ge 
Between-groups reduced 0,0, 8| T | 


(collinearity) gi | Bl loge dn tee 


GIT] 
Total reduced n-1 &|Т| TIG 
"EE 


The right-hand column gives terms which are proportional to the quantities used in testing 
direction and collinearity against a residual term. This analysis further justifies the pro- 
cedure given in the earlier papers, of testing the partial factors (i.e. terms corresponding 
to the terms given in the above covariance analysis) rather than the simple factors. 

This approach to the tests for the adequacy of a hypothetical discriminant function 
greatly simplifies the computations. By this method it is not necessary to calculate the Û 
nor even to determine the canonical variates z;. A straightforward covariance analysis is 
all that is required. However, if what is required is not a test of the adequacy of а given 
discriminant function but the calculation of the most satisfactory discriminant function 
from the data, then the canonical analysis is still necessary. 


(b) The case p>2 
The procedure outlined above may be generalized for greater values of p. We take the 
hypothetical discriminant function to be 


£ =1,X,+1,X_+...+1,X>, 


and consider the analysis of covariance of the X; on £. Since £ is a linear function of the E 
the number of linearly independent variates after the elimination of £ is reduced to p 
As for the case p = 2, the sums of squares and products of the residual variates may 


Within-groups reduced ъ—4—1 | уш т ie "9 T 


E. J. WILLIAMS 371 


separated into terms for direction and collinearity, the term for direction having a single 
degree of freedom. From the р— 1 residual variates, a linear function may be made up in 
the same way as a discriminant function for a comparison for which the sum of squares for 
direction is relatively maximized. This sum of squares clearly has p — 1 degrees of freedom. 
When the sum of squares is maximized relative to the reduced total sum of squares, we get 
the sum of squares for the simple direction effect (collinearity being ignored) (11). When it 
is maximized relative to the reduced sum of squares for residuals from regression on the 
Y,, we get the sum of squares for the partial direction effect (12). The analysis will be demon- 
strated for the simple criterion for direction; the partial criterion may be similarly derived. 

For any two of the variates X, and X,, the sum of products for direction may be derived 
by difference in the manner given above (when p = 2), but may be expressed more simply 
for the present purpose as 


dd bee Weel tee, 
Dee Wei 
where d; = >): 


Now consider some linear eOmpound of the X;, say 
€=4X,4+4X,4+...+1, Xp. 
For this compound, the sum of squares for direction is 
р Үй 
(£ га) бешш! 


while the total sum of squares is * 
[A 


mMer 
“As 


To maximize the ratio of direction to total, which we denote by D, with respect to the 


lj, we have p 2, 
X Рі = d; У I; d; bec Wee. 


The scale of the 1; being arbitrary, we may so choose it that 


p 
Digg = X Ij d, dee Wee, (14) 
p 
whence X Uitni = d; (15) 


If we write the typical element of the inverse of T ast, then the solutions of (15) are given 
by у 
[^ = X tidy 


Hence, from (14), pp 
D= У D thid, dibs Org te. 


It is to be noted that £ is orthogonal to £. This may be obvious from its derivation; 
alternatively, the total sum of products of & and £ is 


p 
Ў RG = Ха 
11 П 

= 0, 


372 Tests for discriminant functions and linear functional relationships 
while the sum of products for direction is proportional to 


ptr 
È ий, = 0. 
The quantity D, being an invariant, may readily be expressed in terms of latent roots 


in order to show the equivalence of the present results with those given in $IV. Trans- 
forming to the canonical variates z;, and putting as before 


£ > EM; Ti, 
_ m{O;—A) 
we have = vit in А 
_ Ут3(0;— А)? 
ы A(1—A) 
_ Emj01i- А? 
~ A1-2) " 
во that 1-D= 1-Zmi6i[A Н 
1-A 
which agrees with (11). 
To derive the test function for partial collinearity in terms of determinantal ratios, we 
make use of the factorization (11). 
The residual likelihood criterion " 
W]|t 
П(1—60,)/(1—Л) = Ж 
SUR NECA оу 7 


Hence the partial collinearity factor is 


|W\t W|t 22 
а/а (апана ae), 09 


which is a (n — 2; p — 1,9 — 1) variate. 

In the same way, the partial direction factor may be found by maximizing the ratio of 
the sum of squares for direction to the reduced residual sum of squares. The maximized 
ratio is 

en SON (17) 
tee + bg Wee wid, d 
This is a (n —4; p — 1, 1) variate, so may be tested by the F-test: 


pa PEH bt" hid, da 
p-1 tee 
with p — 1 and n—p—q + 1 degrees of freedom. 
It may be verified that these results agree with the general results given by Bartlett 
(1951) and the particular results given by Williams (1952а) for p = 2, q = 2. 


VI. GENERAL REMARKS ON THE LINEAR FUNCTIONAL RELATIONSHIP 
As has often been remarked, the linear functional relationship between two variates subject 
to error is different from either of the regression relationships between the two; the relation- 
ships are identical only when the independent variate in the regression relationship 8 


Da r ig a `. 


E. J. WILLIAMS 373 


errorless. Also the two relationships have different applications. The regression relationship 
is the more generally useful, relating as it does to observed values; an important application 
is the prediction of either observed or ‘true’ values of one variate from observed values of 
the other. The functional relationship connects the ‘true’ values of the two variates; its 
use is in theoretical studies of underlying ‘laws’. In most practical applications the regres- 
sion and not the functional relationship is required. Since we are here considering only 
linear relationships, the word ‘linear’ will usually be omitted. 

The regression relationships are based on the variation in both the ‘true’ values and the 
random errors to which they are subject, the functional relationship on the variation in the 
‘true’ values alone. A little reflexion will show (and examination of the literature will con- 
firm; see, for example, Haavelmo (1943)) that the functional relationship is therefore 
relevant only to a study of how the ‘true’ values of both variates are affected by some 
extraneous variate or variates; that is to say, the relationship shows what elements of the 
system are invariant under changes in conditions. It is not of interest to know the under- 
lying relationship (if any) between two variates when each is affected only by random error; 
usually what is then wanted is one or other of the regression relationships. 

It is well known that, when both the ‘true’ values and the errors are normally distributed, 
the functional relationship cannot be determined from data. Lindley (1947), Reiersol (1950) 
and others have proved a number of theorems which show that the non-normality of one 
of the distributions is necessary for the estimation of the relationship to be possible. ‘Thus 
it may be taken that functional relationships are not determinable from the internal 
analysis of a set of variates with normal distributions. Put in another way, since the first- 
and second-order sample cumulants summarize all the information in samples from normal 
populations, it follows that information beyond the first- and second-order cumulants of 
the distributions must be available if the functional relationship is to be determinable. This 
information may be present in the sample, if the distributions are not normal, for then the 
first- and second-order sample cumulants are no longer sufficient statistics; alternatively, 
it may be introduced through knowledge of the relationship of each variate with some 
extraneous variates. As mentioned above, it is only under such circumstances, when some 
such additional information exists, that the functional relationship is of interest; in other 
words, whenever functional relationships are of practical interest they are also determinable 
from data. i р үү 

In passing it may be mentioned that Berkson’s (1950) method of controlled variables, 
as elucidated by Lindley (1953), is really a means of dealing with one variate as though it 
were errorless, and bringing the estimation of the functional relationship back to the 
estimation of a regression equation. 


The foregoing discussion could be generalized for the determination of relationships 


among three or more variates, for which the same general considerations apply. We shall 


be concerned henceforth with the determination of functional relationships among a set 
of variates through their relationships with the variates of a second set. M pagi ri 
1945) has termed the second set instrumental variates, to distinguish them from t £ үз t ЭА 
of investigational variates. This approach has been used also by Geary (1943, 1949) an 


others. 
ini tional relationship 

d, 1940; Bartlett, 1949) for determining the func 
E es of the variates. Neyman & Scott (1951) have shown that only 


edis КОШИ provide consistent estimates. Roughly 


‘in very exceptional circumstances’ does the method 


374 Tests for discriminant functions and linear functional relationships 


speaking, the method leads to consistent results provided the separation of values of one 
of the variates is sufficiently wide in the neighbourhood of the group limits, so that the 
grouping based on observed values is equivalent to a grouping based on ‘true’ values, 
Clearly, under these conditions, the differences between groups may be attributed to some 
extraneous variates, so that the method of grouping is then roughly equivalent to the 
method of instrumental variates. 

Both methods have this in common, that they introduce extraneous variation affecting 
all the variables between which the relationship is found. The existence or assumption of 
this variation is basic to the methods. 

In the next sections of this paper we give some exact tests for the existence of and for 
the constants in linear functional relationships, based on the use of instrumental variates. 
We shall assume for convenience that the investigational variates are subject to normally 
distributed errors, but make no particular assumptions about the distribution of the 
instrumental variates. Indeed, the instrumental variates may even be comparisons among 
groups or populations, in accordance with the general specification outlined in the Intro- 
duction. 


VII. TESTS FOR A SINGLE FUNCTIONAL RELATIONSHIP 
(a) Preliminary remarks 


In $ VL it was pointed out that consistent estimates of a functional relationship between two 
variates could be determined provided there was one instrumental variate. In general, & 
functional relationship among р variates can be determined if there are p — 1 ог more in- 
strumental variates. In this section we shall consider, not the determination of the func- 
tional relationship, but tests for the concordance with the data of a given relationship. 
This approach parallels that adopted above in the sections on discriminant functions. 
It is possible to test the adequacy of the given relationship even when the data do not 
provide enough information to enable consistent estimates to be determined. 

We shall consider, following the set-up given in the earlier sections, that there ате p 
investigational variables X; and q instrumental variables Y;. The given functional relation- 
ship among the X; will be defined by a null function £. It will be realized that, when p 
exceeds g, it is possible to choose p—gq independent linear functions of the original X; 
whose correlation with the У, (or sum of squares between groups) vanishes. These functions 
are equivalent to the p —4 canonical variates corresponding to the identically vanishing 
latent roots. Thus it is not possible to determine from the sample a unique null function. 

On the other hand, the test for a given null function is always possible. The general test 
for the adequacy of the function can be considered in two parts: first, a test for the con- 
formity of the function in direction, with p — 1 degrees of freedom; and secondly, à test for 
the residual correlation (or reduced variation between groups), with q— p-- degrees 
freedom. When q < р, по test for residual correlation is possible; the test for direction then 
has but q degrees of freedom. 1 


(b) One instrumental variate (q = 1) 


In order to gain some insight into the problems involved in testing the adequacy ofa 
specified null function, we consider first the simplest case, where there is but one instru- 
mental variate, Y. In this case, the test is one of the direction of the chosen null function- 
It may be most simply viewed in the following way. The null variate £ has a sum of squares 


£ 
————— eC ee 
a a e PM 


E. J. WILLIAMS 375 


for regression on Y which bears to the total sum of squares a ratio A. It is required to test 
whether this is large enough to indicate real association. The discriminant function x 
based on the sample gives a ratio 0, leaving a residual 1 — 0, with n — p degrees of freedom. 
Hence the efficient test, in which the residual has been minimized with respect to all the 
X, is given by the following analysis: 


Additional accounted for by discriminant function z p-1 0-1 
Direction of null variate 1 A 
Residual n-p 


Total 


(n—p)A 


so that . Fa, n—p) = 1-8 ' 
In terms of determinantal ratios, we have 
A= 6516 1—0-|W|/|T |, 
bee |T 
so that Кул = n pt 
This latter representation would give the simplest way of making the test. 


(c) More than one instrumental variate (q > 2) 

The method given above for q = 1 cannot be directly adapted to the more general case. 
The approach that is adopted is as follows. By hypothesis, the null variate £ is uncorrelated 
with the instrumental variates, so that its total variation is homogeneous. In any sample, 
it is possible to choose a set of p—1 linearly independent functions of the X; which are 
uncorrelated with £; these may be taken to define the direction of £ in the p-space. We shall 
denote these p — 1 functions by X; (i = 1,2,...,p—1). а 1 7 

Now consider the analysis of any опе of the original X, or of any linear function of them 
not linearly dependent on the X1. We can carry out an analysis of covariance of this variate 
with the p— 1 variates Xj, to determine the reduced sums of squares for regression on the 
Y,, for residual variation and for total variation (or between groups, within groups and total 
in the analysis of variance model). Then, provided q>p, we have the following effects 


appearing: 


Difference of regressions 


Regression on the У reduced 
Residual, reduced 


Total, reduced 


Biom. 42 
25 


376 Tests for discriminant functions and linear functional relationships 


This analysis tests the existence of any correlation of the X; with the Y;, apart from that 
accounted for by the Xj, and hence tests the correlation of the proposed null variate. The 
second term tests the residual correlation of £ with the Y; and the first term the direction; 

-each may be tested against the reduced residual variance. 

The analysis and significance tests are equivalent, whichever X-variate (subject to the 
above-mentioned conditions) be chosen, since the elimination of the X; by covariance 
effectively reduces the number of residual variates to one. In practice it is often simplest 
to take the X-variate as £ itself. The fact that the adjustment of £ by covariance leaves the 
total sum of squares unaltered does not affect the results; the degrees of freedom for the 
total line are still reduced to n—p+1. 

It may be verified that, when p = 2, the analysis is exactly the same as that giving the 
test of significance for the discriminant function X3. This is as it should be, since the hypo- 
thesis that Х; is the population discriminant function is equivalent to the hypothesis that 
the variate orthogonal to it (i.e. £) is a null variate. 

The analysis has its parallel in the tests developed for this purpose from Fisher's (1938) 
results by Tintner (1945, 1946), and based on the values of the latent roots. For the existence 
of a single null function (or linear functional relationship), Tintner takes as his test function 
the smallest of p latent roots. As has been shown by Hsu (1941), this smallest latent root is 
distributed approximately as a sum of squares with q—p+1 degrees of freedom. This 
approximation becomes exact in the limit when all the other latent roots are near to unity 
(apparently even if only the sample latent roots approach unity, regardless of the values of 
the population roots). This result has been established above (10) as a particular result on 
the factorization of determinantal ratios, except that there the term for ‘difference of 
regressions’ has not been segregated from the residual. In the derivation of the tests to be 
given below, in terms of canonical correlations, the relationship of the present tests with the 
approximate tests will be made clear. 4 


(i) Analysis in terms of canonical correlations 
The results of the analysis of covariance may be expressed in terms of the squared canon- 


ical correlation coefficients or discriminant ratios 6,, 6s, ..., 0, between the sets of canonical 
variates x; and y,. Let the hypothetical null-variate be 


E = Ута, Em = 1. 
Then the set of р — 1 z-variates orthogonal to £ may be taken as 
а = v,—m,£. 


Now of the y; p have non-zero correlation with the corresponding z;, and д — p have zero 
correlation with any z;. They may be transformed to another set of variates in the following 
way: (1) the set of р — 1 variates most closely correlated with the z/; (2) a variate 7, uneor- 
related with the z but correlated with the z;, and (3), as before, the q— p variates Ш" 
correlated with the z;. 

It is clear that the variate 7 represents the reduced correlation of ë with the y,. It is readily 
defined by the conditions given; for let 


= Уту; 
then Ух, = 0 (h= 1, 2,...,p—1). 


 —«—«——————— و‎ 9-9 O 


E. J. WILLIAMS 377 


Hence Enz, = т, У), 

ог nO} = m, Imo}, 

so that momit, $ 
and, since Inî =1, 


n, = my, (mij). 
For the regression of £ on y, the sum of squares, with ф — p + 1 degrees of freedom, is 
(Zén)? = n$6,/mj = (Imî/ 0)7. (18) 


This is the sum of squares for regression, reduced by elimination of the xj. Clearly it 
vanishes if q < p (for then Û, = 0), and in general takes values between Û, and б,. 

In the same way it may be shown that the reduced residual sum of squares (i.e. the sum 
of squares of residuals of £ from regression on the y, and the z4), with n—p—q+1 degrees 


of freedom, is (®т(1 -8)y1. (19) 


Finally, by subtraction, the sum of squares for difference of regressions (i.e. for the test 
of direction of the proposed null-variate), with p — 1 degrees of freedom, is 


1— (Zmt/0;)71 — (Imî/ (1-0). (20) 


These sums of squares may also be expressed in terms of the latent roots 0, and the latent 
roots for the variates 2; and y; which will be denoted by 6; (i = 1,2,..., p — 1). Since £ is 
orthogonal to all the 2, it follows that the product of the latent roots 0;, multiplied by the 
sum of squares for regression of £ on 7, is equal to the product of the original latent roots; 


that i 
"- пө;т816) = 110, 
Similarly, 11(1 —6)) (2m3/(1— 0) = (1—8). 


The analysis of variance may then be set out as follows: 


Direction L- Ta ц à 00101-0) 
і «U0: 
alan II(1—0)/H( — 6) 
Total 
و‎ 


If the latent roots 0, and the direction cosines m, are known, the simplest procedure is to 


ares from the e ions (18), (19) and (20). , 
calculate the sums of squares from xpressi О ынчы ү | 


The sum of squares for correlation always exceeds 0, e 
tion of £ approaches that of xp. Hence it is seen that the use of 0, as a sum of squares with 


q— p-- 1 degrees of freedom, according to the approximate tests, will underestimate ше 


significance of the correlation. aya 


378 Tests for discriminant functions and linear functional relationships 
If q < p, the analysis reduces to 


Sum of squares 


1- II(1—0,)/II(1 — 0%) | 
II(1 —6,)/1Q —6;) 


there being in this case no test for correlation. 
When q = 1 this analysis reduces to that previously given; for i& may be shown that, in 


genei, à= 26—86, 

so that, in this case (1—6,)/(1—04) = 1—A|(1—0,+à) 

and the sums of squares for direction and residual are respectively 
A[(1— Ө, +A) 

апа (1—0,)/(1— 0, 4- A), 

which are proportional to those previously given. 


Е 
Tt may readily be verified that, when р = 2, these sums of squares reduce to those given 


for a test of a single discriminant function. 


(ii) Analysis in terms of determinantal ratios 

For computing purposes it is more straightforward, when the individual latent roots and 
canonical variates are not required, to express the results just given in terms of deter- 
minantal ratios. 1 

We consider the analysis of the null variate Ё (= XI; Х,, no longer necessarily normalized 
so that its sum of squares is unity), deriving the reduced sum of squares for regression, with 
q—p+1 degrees of freedom, and the reduced residual sum of squares, with n— p-qtl 
degrees of freedom. The sum of squares for ‘difference of regressions’, with p—1 degrees 
of freedom, is obtained by subtraction of these two items from the (reduced) total. 

We may take the X; to be 

Х=Х,- И (= 1,2,...,р-1), 
and shall write 
P, for the vector (pi Dis, -- Р)» 
T, for the vector (t; tio, -+ tip) 
Ti... for the vector (ti, tias <<, p-1)» 
P. for the matrix consisting of the first p — 1 rows of P, 
and 7. for the matrix consisting of the first p — 1 rows and columns of 7’. 

To calculate the reduced sum of squares for regression, the sums of squares and products 
of the X; and £ for the regression of each on the Y; are first determined. For example, the 
sum of products of X; with Y, is 

Фу Ру! 


E. J. WirLiAMS 379 
so that the regression sum of products of X; and X; is 
(Аи) U- (P -Fi tglt). 
Accordingly, the matrix of regression sums of squares and products of the X7 is 
(P. — Tg- Ну) UPL — РИ и). (21) 


Now the reduced sum of squares for the regression of £ on the Y, is the ratio of two deter- 
minants: that of the regression sums of squares and products of the X and £, and that of 
the regression sums of squares and products of the X; alone. By transformation without 
change of scale, the X; and £ may be replaced by X;, X, ..., X, ,, and l, X,; hence the 
determinant in the numerator is equal to 


5| PU3P'| =G |B. 


The required denominator is the determinant of the matrix (21), which is found, by elemen- 
tary transformations, to be 


vs 0s m --£ 7 T. 


= 21|, 
ЕЕ 


Hence the reduced sum of squares for regression is 
ват; 

In the same way, it may be shown that the reduced residual sum of squares is 
зт, 


Also, since ETAT: = te, © 
we see that the reduced total sum of squares, which is given by the analogous formula 
(TTA Ў vt 
is equal to fgg, as it should be. Мар Ay 


On elimination of common factors from each term, the analysis of variance may be set 
out as follows: е 


Direction p-1 (by serit 
Regression, reduced q-ptl (TBT) 
Residual, reduced n—p-q*l 


-up-i = qu 
Total, reduced n—ptl (T; TT = tg 


It may readily be verified that, when the variates are transformed to the canonical Bef, 
the terms in this analysis of variance reduce {о those given in the previous subsection. 


380 Tests for discriminant functions and linear functional relationships 


УШ. GENERALIZATIONS 


It has been shown above that the test for a given functional relationship as developed in 
this paper is really equivalent to the test for the adequacy of p—1 discriminant functions 
to represent the variation in the data. Thus we have treated here only the tests for 1 or p— 1 
discriminant functions. Tests for any number r of discriminant functions may be derived 
in the same way, but the cases treated above are the simplest to deal with, since they each 
deal with only one function. (discriminant function or null function); fiducial limits for the 
constants in these functions can therefore be derived using these significance tests. 

If r hypothetical discriminant functions are given, their effect may be eliminated, giving 
the following factorization of the likelihood criterion: 


(n; p,q) = (т; 7,9) (n—7; p—r,9). 


The second factor on the right is the residual likelihood criterion, which may be further 
factorized to give criteria for direction and collinearity: 
(n-rip—r,q)-(n—r;p-r,r) — (n—-25 p—r,q—r) 
simple direction partial collinearity 
= (n—q;p—r,r) (n—7; p—7,q—7) 
partial direction simple collinearity 
When r = p— 1, one of the degrees of freedom in each factor is unity, which is thë reason 
why the tests for a single null function can be thrown into the form of an analysis of variance. 
We then have 
(n-p+1; 1,9) = (n-p*l;Lp—l) (n-2p*2;1,q—p-1) 
= (%—4;1,р—1) (n-p+1;1,g-p+1) ' 


which corresponds to the analysis of variance set-up previously given. 

There seem to be practical applications of tests for more than one functional relationship; 
for example, Bartlett (1948) discusses the problem of estimating supply and demand 
relationships. On the other hand, it seems that seldom if ever is more than one discriminant 
function required in specifying differences among populations; if these differences cannot 
be interpreted in terms of one discriminant function, what is then required is some likeli- 
hood test as a means of classifying the observations into the different populations. 


REFERENCES 


ANDERSON, T. W. (1951). Estimating linear restrictions on regression coefficients for multivariate 
normal distributions. Ann. Math. Statist. 22, 327. 
a М. 8. (1938). Further aspects of the theory of multiple regression. Proc. Camb. Phil. 500. 
4, 33. | 
BARTLETT, M. S. (1939). A note on tests of significance in multivariate analysis. Proc. Camb. Phil. 
Soc. 35, 180. 
BARTLETT, M. 8. (1947). Multivariate analysis. J. R. Statist. Soc. Supp. 9, 176. 5 
BARTLETT, M. 8. (1948). A note on the statistical estimation of supply and demand relations from time 
series. Econometrica, 16, 323. 4 
EDS M. 8. (1949). Fitting a straight line when both variables are subject to error. Biometrics, 
BARTLETT, M. 8. (1951). The goodness of fit of a single hypothetical discriminant function in the case 
of several groups. Ann. Eugen., Lond., 16, 199. 
BERKSON, J. (1950). Are there two regressions? J. Amer. Statist. Ass. 45, 164. 


, 


&: 
E. J. WILLIAMS 381 


Fisun, К. A. (1938). ‘The statistical utilization of multiple measurements. Am. Eugen., Lond., 8, 376. 

Geary, К. C. (1943). Relations between statistics: the general and the sampling problem when the 
samples are large. Proc. В. Irish Acad. A, 49, 177. 

Geary, Н. C. (1948). Studies in relations between economic time series. J. R. Statist. Soc. B, 10, 140. 

Geary, К. C. (1949). Determination of linear relations between systematic parts of variables with 
errors of observation the variances of which are unknown. Econometnica, 17, 30. 

HaavELMO, T. (1943). The statistical implications of a system of simultaneous equations. Econo. 
metrica, 11, 1. 

Hsu, Р. L. (1941). On the problem of rank and the limiting distribution of Fisher's test function. 
Ann. Eugen., Lond., 11, 39. 

Koopmans, Т. (1937). Linear Regression Analysis of Economic Time Series. Haarlem: DeErven, F., 
Bohn, N.V. 

LixpLEY, D. V. (1947). Regression lines and the linear functional relationship. J, R. Statist. Soc. 
Supp. 9, 218. 

LINDLEY, D. V. (1953). Estimation of a functional relationship. Biometrika, 40, 47. 

NEYMAN, J. & SCOTT, ELIZABETH L. (1951). On certain methods of estimating the linear structural 
relation. Ann. Math. Statist. 22, 352. 

REIERSOL, О. (1941). Confluence analysis by means of lag moments and other methods of confluence 
analysis. Econometrica, 9, 1. 

RrrERsOL, О. (1945). Confluence analysis by means of instrumental sets of variables. Ark. Mat. 
Astr. Fys. A, 32, no. 4. 

Rerersor, О. (1950). Identifiability of a linear relation between variables which are subject to error. 
Econometrica, 18, 375. 

TINTNER, С. (1945). A note on rank, multicollinearity and multiple regression. Ann. Math. Statist. 
16, 304. 

TINTNER, G. (1946). Multiple regression for systems of equations. Econometrica, M, 5. х 

Ттхтхе 0. (1950). A test for linear relations between weighted regression coefficients. J. R. Statist. 
Soc. B, 12, 273. 4 

Warp, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. 
Statist. 11, 284. PTE rT N 

Winuaws, Е. J. (1952a). Some exact tests in multivariate analysis. Biometrika, 39, 17. — А 

WILLIAMS, E. J. (19526). The interpretation of interactions in factorial experiments. Biometrika, 
39, 65. : A > 5 

WinurAws, E. J. (1952c). Use of scores for the analysis of association in contingency tables. Bio- 
metrika, 39, 274. 2 ў A . 

Wrt1ams, E. J. (1953). Tests of significance for concurrent regression lines. Biometrika, 40, 297. 


[ 382 ] 


THE USE OF TRANSFORMATIONS AND MAXIMUM LIKELIHOOD 
IN THE ANALYSIS OF QUANTAL EXPERIMENTS 
INVOLVING TWO TREATMENTS 


By Е. YATES, F.R.S. 
Rothamsted Experimental Station 


Tf the effect of a treatment is of the quantal (all or nothing) type, e.g. it kills or does not kill 
an insect, the results of experiments will be in the form of proportions of experimental units 
affected. If t such treatments are compared in a single experiment in which the experi- 
mental units are randomly allocated, the differences between the observed proportions will 
give estimates of the differences between the treatments, and a y? test on the associated 
2 xt table (L—1d.f) provides an overall test of significance. If, however, in order to 
increase the precision, the units are subdivided before allocation into groups which are 
relatively homogeneous within themselves, the experiment will be analogous to an ordinary 
randomized block experiment, the groups being the blocks, but the quantities requiring 
analysis (corresponding to the ‘plot yields’) will be proportions instead of quantitative 
measurements. When a set of experiments is carried out, with random allocation within 
each experiment, but with possible lack of homogeneity between experiments, the situation 
is similar, the groups being in this case experiments. 

If the numbers of experimental units in the various cells of the group х treatment or 
experiment x treatment table are approximately equal, or more generally if they are pro- 
portionate, the analysis presents no great difficulty. The use of the angular transformation, 
and an analysis of variance of the transformed values, will frequently be all that is required. 
Indeed, if the variation in susceptibility from experiment to experiment is not large, the 
results will not be seriously distorted if the data from the different groups or experiments 
are pooled and treated as homogeneous. If, however, the numbers in the different cells 
vary in an irregular manner, as is often the case in material of this kind, pooling is inadmis- 
sible and inequality in the weights prevents straightforward analysis. 

The application of methods of estimation to quantal data has been extensively developed 
in recent years through the use of transformations in conjunction with maximum likelihood. 
Exact analytical methods were first developed for the probit transformation, in connexion 
with biological assay, but similar methods have recently been developed for other trans- 
formations of which the logit and log log are of most interest in this connexion. These methods 
are beginning to be used for the analysis of quantal data of many kinds (see, for example, 
Jolly, 1950; Dyke & Patterson, 1952; and Yates, 1953, §9-7), but their utility for dealing 
with experimental data of the type we are considering is still insufficiently realized. Com- 
bination of probabilities, for example, is still commonly used to provide a test of significance 
(sometimes called the probability integral test) for sets of experiments involving тч 
treatments. 

The estimation approach automatically provides a test of significance as a by -product 
of the estimate and its estimate of error. Its greatest advantage, however, is that if ар 
effect is demonstrated we are immediately provided with an estimate of its magnitude m 


ө ал #8} 


F. YATES 383 


meaningful terms, together with approximate fiducial limits. At the same time, answers are 
provided to certain ancillary questions regarding the homogeneity of the data. 

The analysis of a group of experiments containing several treatments has been discussed 
by Jolly using the probit transformation. The present paper deals in more detail with the 
case of two treatments, particularly in respect of errors and tests of significance. The appro- 
priateness of the various transformations under different circumstances is also considered 
in more detail than by Jolly. A further new point is the use of weighted estimates to give 
a first approximation. The investigation originated in a request by Prof. Gert Bonnier of 
Stockholm for advice on the exact procedure for applying the combination of probabilities 
test to a set of genetical experiments on mutation rates in Drosophila, and I am much in- 
debted to him for permission to reproduce the data here. The numbers involved in these 
experiments were sufficiently large for the data to be analysed by large-sample methods, 
and this analysis will first be described, as it brings out many of the points at issue without 
the complexities associated with the use of transformations and the method of maximum 
likelihood, The maximum-likelihood method will then be developed, and applied to this 
set of data and to three examples given by Pearson (1950). 


THE COMBINATION OF PROBABILITIES TEST 


If we have a set of k experiments (or observational data) comparing two treatments, the 
results of each experiment can be set out in the form of a 2x 2 table. A test of significance 
on the data as a whole can be made by calculating the significance level P for each experi- 
ment separately, and then combining these probabilities by the method first suggested by 
Fisher in the fourth edition of Statistical Methods for Research Workers (1932), and indepen- 
dently by Karl Pearson (1933). This consists of calculating the sum of the quantities 
— 21og, P, which are the values of д? for 2 d.f. corresponding to the significance levels P. 
1f there is no difference between the treatments, then, subject to certain qualifications, 
S(— 2log, P) will be distributed as д? with 2k d.f. 

This test and the analogous test provided by direct summation of the values of x? for the 
separate experiments have been discussed by various authors, in particular Cochran (1942, 
1952), Lancaster (1949) and Pearson (1950). Certain further points emerged in the course 
of the present investigation, but these are best dealt with in a separate paper (Yates, 1955). 
Here it is only necessary to make the following general points: \ 

(1) Application of the combination of probabilities test does not provide an analysis 
which isin any sense complete, since no estimate of the magnitude of the treatment difference 
is provided. 0 ў 

(2) The test must be in some degree inefficient. There are two main reasons for aot 

(а) No account is taken of the relative accuracy of the different experiments when 


combining the results. 
(b) S(— 2log, P) does not provide an efficient estimate of the treatment difference, even 
when all experiments are of the same accuracy and the difference is small, and consequently 


cannot be the basis of an efficient test of significance. 1 | T 

It is perhaps also worth noting here that Fisher did not envisage the use of the en 
bination of probabilities test on data of the type we are considering. He put it forwar pi 
use in cases in which it is desired to obtain a single test of the significance of an aggregate 0 
probabilities, ‘taking account only of these probabilities, and not of the detailed COIR E 
of the data from which they are derived, which may be of very different kinds’. 


‹ 


384 Use of transformations and maximum likelihood 


DROSOPHILA DATA: THE LARGE SAMPLE APPROACH 


The data with which we are concerned are shown in Table 1. They were derived from a set 
of six experiments on the frequency of lethals in paternal X-chromosomes from irradiated 
spermatozoa of Drosophila melanogaster. The experiments were carried out to test whether 
there was any difference in this frequency when the irradiation was given to spermatozoa 
which were harboured (F) in the females’ receptacles, and (M) in the males’ testicles. In the 
first four experiments a dose of 960r. was given, and in the last two a dose of 3000r. In 
addition to the change of irradiation rate it was believed that unavoidable environmental 
changes between experiments might influence the mutation rates. An account of the 
experiments has now been published (Bonnier & Lüning, 1953). 


Table 1. Frequency of mutants with two methods of irradiation 


——- 
Method F Method M 
Exp. 
Mutant Normal 95 mutant Mutant Normal % mutant 
1 29 815 3-44 45 1486 2-94 
2 18 2622 2-89 21 1438 1-84 
3 110 3175 3-35 100 4281 2-28 
4 25 1038 2-35 52 2058 2:47 
5 31 339 8:38 43 507 7:82 
6 57 543 9:50 27 436 5:83 


It will be seen that in all experiments except Exp. 4 method F gives a higher mutation 
rate than method M. We require to test the significance of this difference, and obtain an 
estimate of its magnitude. 

Tn order to develop a suitable estimation procedure we must specify in mathematical 
terms the quantities that require estimation. This specification will depend on the pheno- 
mena that are being investigated. In the present case we may postulate that with a given 
method of irradiation and an experiment of standard sensitivity a unit of irradiation has 
a given small chance 0’ of producing a mutation ata given locus, The chance that А units of 
irradiation will produce a mutation at that locus will then be 


l—e-W', 
If several loci are involved, with probabilities 0’, 0", etc., and 0' 4-0" +... = 0, the total 
probability of one or more mutations will be 


] ен.) | — e0, 


If the conditions vary from experiment to experiment a sensitivity factor и, will have to be 
introduced, giving a total probability of mutation for experiment r of 1—e~*r/r?, where Ағ 
is the irradiation rate for the rth experiment. 
With two methods of irradiation the probabilities of mutation in experiment ^ can 
therefore be written as 
m,-1 —exp [ к> AH], 


Ty = 1—exp[—A, 11,4]. 


Е. Yates 385 
Since д, is unknown it must be eliminated, giving 


0, _ log (1—y) 

0, log (1—my)° 
Substituting the observed mutation rates p,, and Pa for m, and л, we obtain an estimate 
of 0,/0, from the rth experiment. These separate estimates can then be combined by 
weighting inversely as their estimated variances. 

In the present example the mutation rates are sufficiently small for 7,, and z;, to be taken 
as approximately equal to A,“,4,, and A, 11,0, respectively. There will be certain advantages 
in working with estimates of 0.-0 

ае: 
30, 4-6,) 


instead of 0,[6,, i.e. the ratio of the difference of the mutation rates for experiments of unit 
sensitivity per unit irradiation to the mean of these mutation rates. The quantity 
av Die Dee 
" }(PrtPs) 

will provide an estimate of this ratio. An approximation to the variance of Ё, is easily 
calculated by large-sample theory as follows: 

QE, Spy CAE 

Op, (+) Por (Part Par)? 


If the rth contingency table is written 


F M Total 
Mutant т, 1 ni. т, 
Normal Nyy — Mir Nor — Nar Ny — Ny 
Total Tar Ney т, 


во that p,, = ni, [my Par = Тг» We have 


1- Mall — Toy) = 
Vp.) = A, Pepe) e ETE, этн» = 0, 
Hence, approximately, 
vin) = dete. pete Pte a) 
(Ё) = pit Parl т, Mae 


This formula is approximate both because we are dealing with a non-linear function and 


because 7,, and 71%, have been replaced by Pır and ру. If we are c. ис mea En 
significance of the difference of Pır and ps, we may replace them у the poo ue 
1 of the table. Since the factor 1/(Ф„+ Por) occurs in 4, 


p, = тт, from the marginal tota 
the factor 1/(Pyy + Par)? may be left unchanged in the variance. We then have 
Pede (2) 


702) = рыть 


386 Use of transformations and maximum likelihood 
It may be noted that under these circumstances 


E; = {My (Noy =z пэ.) جر‎ т» (т, E Mir)? 
V(E,) Nyy Map y(t, — n7) 


D 


which is the ordinary expression for x? in а 2 x 2 contingency table, without correction for 
continuity. 

As has been shown by Cochran (1942) and others, the correction for continuity is not 
required when combining data from several tables. If, therefore, we take a weighted mean 
of the estimates E, with weights 1/V(E,), the resultant test of significance may be expected 
to be adequately accurate when the values of X? corrected for continuity give satisfactory 
approximations to the one-tail probabilities of the distributions generated by the 2х2 
tables. This will be the case in the present instance, since all expectations are reasonably 
large and n, and n, differ by less than a factor of 2 in all experiments, and are very nearly 
equal in total. 

The calculations for Ё using formula (2) for V(E,) are shown in Table 2. The weighted 


mean of Ё is х. 
Е = 44-182/153-85 = 0-2872, 


with standard error 1/,/153-85 = 0:0806. The level of significance (one tail) is therefore 
Р = 0-000184. 


Table 2. Calculation of Ё 


a = 
Exp. Е V(B) w wE _ 
1 +0-1559 0-05459 18-32 2-856 
2 +0-4421 0-04622 21-63 9:563 
3 +0:3786 0:01790 55:88 21-156 
4 — 0-0489 0:05776 17:31 — 0:846 
5 +0-0692 0-05099 19-61 1:357 
6 +0-4785 0-04739 21-10 10-096 
کے‎ E 
+0:2872 + 0-0806 — 153-85 44:182 
[eS ee Ex M 
P — 0-000184. 


The results of Table 2 also provide a simple test of the constancy of (0, — 0,)/(0 - 0) 
over the different experiments. All that is necessary is to caleulate the weighted sum of 
squares of deviations of Е. This is 


Sw,(E,— E = Sw,E?— ESw,E, 
— 4-902. 
If there is no real variation this will be distributed as x with 5d.f. This gives P = 0:42, 80 


there is no evidence of variation. 


The same procedure may be adopted to test whether there is any evidence of variation 
in the mutation rates from experiment to experiment, apart from those due to differences 
in irradiation rate. For this purpose it will be appropriate to test whether the estimates of 


Е. Yates 387 


100, +0.) are homogeneous. These estimates are given in Table 3, together with their 
standard errors calculated in the conventional manner. The value of the weighted sum 
of squares of deviations of these rates is 5:32 (y! with 5d.f.) giving P = 0-38. There is thus 
no evidence of any such variation. The postulated law also accounts satisfactorily for the 
difference in mutation rates at the different levels of irradiation. 


Table 3. Estimates of 4(0, +0,) x 10% 


0-332 + 0-0396 
0-246 + 0-0248 
0-293 + 0-0201 
0-251 + 0-0300 
0-270 + 0-0306 
0-256 + 0-0270 


our wre 


0-272 + 0-0110 


It may be noted that if the irradiation rate had been constant, and the difference between 
0, and 0, could be neglected, the ordinary x? test on the 2 x 6 contingency table of the values 
of n, and n, — n}, could be used to test the constancy of the mutation rate from experiment 
to experiment. 

Since the difference between the p,, and р», is demonstrated we might revise the calcula- 
tions by using formula (1) for V(E,). This, however, is merely a partial step in the direction 
of the small sample solution by maximum likelihood, which will be outlined in the following 
sections. 

Cochran (1954) has adopted a similar procedure to that given above in order to provide 
a test of significance for a set of 2 х 2 contingency tables. His test criterion is the weighted 
mean of the quantities p;, — ps. Thisis, in fact, equivalent toa weighted mean of the quantities 


E. = Du Por 
Prr 


E differs from Æ, only in the substitution of the pooled estimate p, and the introduction 
of the factor g,, which gives a difference, analogous to logits, which may be expected to be 
reasonably constant over a wide range of p,. Regarded in this way, therefore, Cochran's 
analysis could easily be extended to provide a test for the constancy of the difference from 


experiment to experiment. 


MAXIMUM-LIKELIHOOD SOLUTION: CHOICE OF TRANSFORMATION 


The maximum-likelihood solution is most easily obtained by the use of a suitable trans- 
formation, as, for example, the probit transformation, which has become familiar in 
assay. The maximum -likelihood solution can then be obtained by 


toxicology and biological z Р 
successive approximation. The choice of transformation depends on the mathematical 


specification (the ‘model’) which is considered most appropriate to the data under 


388 Use of transformations and maximum likelihood 


investigation. When this specification has been decided the transformation chosen should be 
such that the quantities that require estimation are simple functions of the transformed 
variate; thus in probit work it is anticipated that the transformed variate will bear a linear r 
relation to the dosage when the latter is expressed in suitable units (usually logarithms), and 
the quantities that require estimation are the constant term and the slope of this regression. 
A secondary requirement is that in no part of the range (or at least in no part covered by the 
data) do the working values or the weights associated with the transformed variate become 
infinite. à 
In the genetical example given above the transformation 


y = log(—logq) 


is obviously suitable. For with this transformation if Y,, and Y,, represent the transformed 
values of 7;, and ,و7‎ we have 
Y; = log A, log p, - log, 


Ү,, = log À, + log p, 4- log 0,. 


The effects of the treatments and of the variations in sensitivity are therefore represented 
by additive components in the transformed scale. This transformation, which is the same 
as the ordinary log log transformation with p replaced by g, may be termed the comple- 
mentary log log transformation. 

In many cases in which there is no clearly appropriate theoretical model, and in which 
there is no reason to differentiate between the two ends of the probability scale, the logit 
transformation 

z = +102 (p/q) 
is likely to be suitable. When the p’s are all small the logit transformation is equivalent to 
one-half the complementary log log transformation, and when the p’s are all nearly unity 
the logit transformation is equivalent to one-half the ordinary log log transformation.* 

Tf the logit transformation is used the analysis will be conducted on the assumption that 
the treatment and sensitivity effects are additive on the transformed scale, i.e. that, except 
for random variations, the difference in the logits for the two treatments is constant from 
experiment to experiment. 

Tn certain circumstances the probit transformation may be considered to be more appro- 
priate than the logit transformation. Except at the extreme ends of the scale, however, 
there is little difference between them, and in the absence of any theoretical reasons for 
choosing one or the other it will rarely be possible to determine, by reference to the data 


themselves, which specification gives a better representation of the effects under investiga- 
tion. 


MAXIMUM-LIKELIHOOD EQUATIONS 


As has been demonstrated in a number of places, the general procedure of obtaining the 
maximum-likelihood solution when using a transformation of the form y = $(p),isas follows: 
(1) Determine a set of N provisional values Y of the transformed variate for the N ob- 
served proportions. These can conveniently be deduced from preliminary (here called first) 
estimates of the required parameters or, when regression lines are involved, graphically, i 
* In the author's Sampling Methods (2nd edition, 1953) the logit transformation is defined 48 


im log p/q. This does not accord with modern usage, and was due to an over-hasty reference tO 
Finney's Statistical Method in Biological Assay (1952). 


F. Yares 389 


In many cases it pays, when making the first estimate, to use weights similar to those used 
in the further steps of the analysis. 
(2) For each observed proportion p (given by say n' successes out of п observations) 
calculate a working value given (when y increases uniformly with p) by 
y = рї, + Minin, 

= Ymax. — (Range) 

= Fin, +p(Range). 
where Улу, апа Fmi, are the maximum and minimum working values corresponding to 
the assigned provisional value Y, and Fmax. = ın. = Range. Determine also the weight 
W to be assigned to the working value. This is given by nw, where w is the weighting 
coefficient. Fmax.» ып, and w are functions of Y (or of the corresponding untransformed 
proportion P) whose form depends on the transformation. The general formulae are 


——— ee ee 


For the transformations commonly in use tables of these functions are readily available. 
If y decreases uniformly with increasing р the formulae for Fmax. and Y,,, are inter- 
changed and p and q are interchanged in the formulae for the working value y. 
(3) Using the working values as observed variates obtain revised estimates of the para- 
meters by the ordinary method of least squares, weighting each observation by the weight 


determined as above. A. 
(4) From the revised estimates of the parameters determine a new set of provisional 


values, and repeat the whole process as often as is necessary. | 
In the case of the genetic example already discussed in large sample terms we may write 


log O, = t; 1020, = te, 
logA, =l, logy, =m, (r= 1,...,Ё). 


tı, ty and the m, are the parameters that have to be estimated. а; 
If t, t, and эл, are first estimates of these parameters provisional values are given by 


the equations ү, „ая " 5 
Y, = 1, +, +b 
Experiment r gives the two observation equations 
ty +m, = y lt e weight Wir 
ta My, = Yor — lp + Eor weight War, 


there being k such pairs of equations in all. i У Pet 
The ordinary method of least squares then gives the normal equations: 


t EW, +m, Wir = ХИЙ, 1,), 
ta ZW, + Lm, Wor = ZWa,(Yar = 1), 
1g Way + War) +t, Wir + to Wor = Wir = 1) (уә 1) (r= bes), 
where X denotes summation over r from 1 to k and t, t, and on m, now denote estimates. 
These equations are of the same form as equation (1) of Jolly’s paper. 


390 Use of transformations and maximum likelihood 
Since one of these equations is redundant we may conveniently put 
Em, (Wi, + Wer) = 0. (4) 
The sum of the first two equations then gives 
tı ZW, + te DW, = ЭЙ уу, + LW, Yo, — (Wi, + War) lp- (5) 
Substituting the values of m, given by the last k equations in either of the first two equations, 


we obtain 
MR | War Woy 
ly — ta i Bine Yor) ia ҮК (6) 


Thus t, — t, is a weighted mean of у„— Yə» with weights W,,W,,/(W,, + Wor). 


1 
To obtain m, equations (5) and (6) may be solved for t, and t,. We then have 


1 
m, +1, = We + Wi, Ur- t) +WarlYor— to)}- (7) 


When the logit transformation is used and the treatment and sensitivity effects are 
assumed additive on the logit scale, the solution is identical with that given above, except 
for the omission of the Г terms. 


APPLICATION TO DROSOPHILA DATA 


The computations leading to the normal equations are set out in Table 4. The first estimates 
have been obtained by taking values of у corresponding to the observed р, and weighting 
coefficients corresponding to these у. Thus from Table XVI of Finney (1952) the y value 
corresponding to 1 — 0:0344 is — 3-37, and from Table XVII the corresponding weighting 
coefficient is 0-033, so that W = 0:033 х 844 = 28. The values of y,— Y» W,+W and 
W,W,/(W, + W,) are then calculated. These last are used as weights in calculating the weighted 
mean of y; — у», namely, +0-2990. The equations for t, and t are therefore 


332; + 290, = — 6505-33, 
1—65 = + 0:2990. 
The numerical term of the first equation is obtained from the sum of products of the y and 


W columns less those of the Гапа W, + W, columns. 
Solution in the ordinary manner gives 


ti = — 10:3193, t, = — 10-6183. 


Equations (7) then give the values of m/ +1, shown in Table 4. 
Provisional values Y,, and F, are now calculated from equations (3). Thus 


Y, = — 10-3193 + 7-0357 = — 3:28. 


The working values and weights are calculated as explained above, using the appropriate 
table (Table XVII of Finney). It may be noted that if, as here, interpolation is requi 
it is easiest to calculate two values of y and interpolate between these. 

The remainder of the calculations proceed exactly as before, and lead to the second 
estimates shown in Table 4. 


Е. ҮАТЕЗ 391 


Table 4. Analysis of Drosophila data 


r س‎ 
Basic dat : 

| егіс сайа | Ist estimates nd оні 

Exp. — 0l -— 


| nm 100p, | 
| 


844 3:44 
| 2 | 270 289 | 
| 3 | 3285 3-35 
| 4 1003 2-35 
5 370 8-38 
6 600 9-50 
za 
"3 100p, "m 
E TA 


1 | 1531 294 1 — 3:509 

2 | 1465 184 —401 —3-979 30-3 
3 | 4381 2-98 —3-76 — 3-767 106-0 
4 | 2105 247 —3-68 — 3-683 47-0 
5 | 550 7-82 —2-51 —2:504 394 
6 | 463 583 —2-82 — 2-803 31-0 


——-— — س 


ШШШ _ 
WW. 
1 ~i |+» WW, m +l, т. 
1 6-8669 +0-16 74 17-4 | 70357 4-0:1874 
2 6-8609 4-0-48 107 19-7 | 6-6853 —0:1137 
3 6:8669 +0-39 205 51:1 | 6-9062 +0-0341 
4 6:8669 —0-04 77 17:2 | 6-8238 — 0:0598 
5 8-0064 +0-08 75 183 | 8-0149 4-0:0070 
6 8-0064 +0۰52 84 18:3 | 7-9483 —0-0571 
--0-2990| 622 142-0 4-0-2937 | 629-6 


— 10-3265 


h —10-3193 
— 10-6202 


ta — 10:6183 


From the small changes in the values of y, and y, and the still smaller change (from 
--0-2990 to + 0۰2937) in 4, —,, which is trivial compared with the range of the values of 
which it is the weighted mean, it is clear that no further approximation is required. 


(A further cycle actually gives a value of + 0:29341 for tı — ts.) 
he first estimates by weighting the directly transformed 


The advantage of deriving t ily t 
proportions will now be apparent. The computations follow the same routine in each case 


(except for the calculation of working values) and a very satisfactory first approximation is 


26 Biom. 42 


392 Use of transformations and maximum likelihood 


obtained. Moreover, the similarity of the figures in the two sets of computations provides 
a useful check against gross errors.* 

The maximum-likelihood solution may be compared with the large-sample solution 
already obtained. For comparative purposes a correction to Ё is required to allow for the 
fact that squares of АӨ, and A6, have been neglected in the large-sample solution. This is 
obtained by multiplying each component of Е by the appropriate value of [1 + }(p,+p,)]. 
With this adjustment, and the corresponding adjustment in the weights, the large-sample 
estimate becomes 0-2930, with weight 147-8. The maximum-likelihood solution gives 
0,10, = 1۰3414, so that (0, — 05)/1(0, + 0,) = 0-2916. The agreement between the two methods 
of estimation is thus v close, the discrepancy being less than 1/50 of the standard error. 


ESTIMATION OF ERROR AND TESTS OF SIGNIFICANCE 


Three types of test will commonly be required: 

(1) A test of the difference between the two treatments, i.e. a test of whether t — t, is 
significantly different from zero. Whether or not the difference is significant the estimated 
standard error of t, — t, should be given so that approximate fiducial limits can be assigned. 

(2) A test whether there is any real variation in sensitivity of experiments, i.e. whether 
the differences between the m, are significant. 

(3) A test whether the residual variation is greater than can be accounted for by binomial 
sampling. 

In the analysis of normally distributed data with variances in known ratios, i.e. with 
known relative weights, any set of parameters of which estimates have been obtained by 
the method of least squares can be tested for significance by finding the difference in the 
sum of squares accounted for when all parameters are included in the specification and that 
accounted for when the parameters under test are omitted from the specification. This 
difference is tested against the residual sum of squares by means of the z distribution. For 
the corresponding x? test on quantal material, however, the relevant д? cannot be exactly 
determined in this way, since changes in specification result in changes in the expectations, 
which in turn alter the expected variances. 

A simple example will illustrate this point. Suppose we have two sets of }n observations 
(n even), which are known to be derived from Poisson distributions whose means ш, and Из 
are possibly different. If £, = ji, the value of y*, д2, say, based on the whole of the data will 
be approximately distributed as y? with n— 1 d.f. Likewise the value of x2, xX% say, cal- 
culated from the difference of the sums of the two sets of observations, will be distributed 
as x” with 1d.f. Whether or not £, = j/ variation of the observations within sets will give 
two values of y?, Xî and Xã say, each of which will be distributed as д? with 1»— 1 d.f. It can 
easily be shown that 


PORE EE RET 
= abt یر + یر‎ + 14-4), 


where z, and z, are the observed means of the two sets and 7 = 32, -2,). If instead. of 
x the difference between the total and residual х, i.e. 3 — 31 — дф, is taken as the criterion 
* In general we may expect to save at least a cycle of iteration by initial weighting. In Dyke and 


Patterson's example, for instance, the first estimates obtained in Sampling Methods by weighting a 
fully as accurate as Dyke & Patterson's second estimates. 


^ > 
F. YATES 393 


for testing the hypothesis £, = д, the discrepancy given by the last term of the above 
equation will be obtained. 

It is therefore advisable to use direct tests for each set of parameters, instead of relying 
on differences between д2 values. The general rule for obtaining the variances of the 
estimates of parameters in maximum-likelihood solutions of this type is to invert the 
matrix of the coefficients of the final set of normal equations, thus obtaining а с matrix of 
the customary type. The variances and covariances are then given directly by the c values. 

In the present case the fact that one of the constants is redundant introduces a slight 
additional complication, The procedure has been given by Yates & Hale (1939). Suppose 
that the J normal equations E, = 0, E, = 0,..., E, = 0 are rned by the identical 


lationshi 
E ЛЕ +A, E+... +A, £,=0, (8) 


and that the relation Habi + t b, + ... + b =0 (9) 


is assumed between the estimates of the parameters. Then instead of replacing the numerical 
terms of the normal equations by 1, 0, ..., 0 іп order to obtain the equations for суу, буз, ..., буу, 
they must be replaced by 


15 AM _ Алл» 787755 A12 i 
Sau) SA "7 (Аи) 
Similarly the equations for Cg; Css, ---; Cg, will be obtained by replacing the numerica. 
terms b 
ИС ЕОР ЗЕ ny 
S(Au)' Sau)’ "7" S(An) 
etc.* 


Designating the normal equations of our example by the suffices a, b, 1, 2, ..., k, we have 
—E,— E, E, 4 E47... E, = 0, 
so that 1= = -1, AA. = =1. 
Also from (4) Ma = fly =0, Hyp = Wat We (ro 1,..., E). 


Consequently 8(Ди) = E(W,, + Wẹ). The correction terms to be added to each of the first two 
sets of auxiliary equations are therefore 


0, 0, gy goa е» is 


where g, = (Wy, + We,)/(, + И). For the remaining k sets the same correction terms are 
subtracted. 
Tests for the ts 
The c's can now be easily evaluated. All that is necessary is to pre айо 
changes їп the numerical terms of the normal equations and utilize the so already 


obtained. en 
by introducing iona. 
ed to two (or more) redundant constants s 
correction terms — ш; А;/5(1/А/), etc., provided that all the sums = o err e SAL), 
S(A’n), etc., are zero. The Yates & Hale paper is incorrect in that it omi ; assigns 
a numerical term ду instead of zero to equation (9). 


* The process can be extend 


394 Use of transformations and maximum likelihood 
Wir War 


Putting U = A find 
Caa = Vit) = WT т)? EDUC RC 
Can = COV (hl) = ET MD) TELE 
Hence >8) = 5 x х А 


as could, indeed, be deduced directly from the form of the expression for f, — t, and 


1 (OR, — Wo) 


VG +) = sap + We) U(W, + Wy 


Test for the m/s 
Using a similar procedure for set ғ, and putting 


q -l WEW, Wy EH, 


~ U (War + War) (Way + Wa) 
1 1 
т == Vim) = еч Se We) ^ 
1 
Crs = COV (Mp, Mg) = ЫП ЕЙ). 7e 
oe 1 1 
Thi = oe EELA S. m тү 
в gives У (т, m,) WoW, WE. (F T;) ? 


which, with a certain amount of algebra, can be shown to be equal to 


uf + ee | 
Writ Wis Wat Wo)” 


This provides a check, since it corresponds to the expression for V(t, —t,) for two experi- 
ments, with interchange of treatments and experiments. 

Apart from the 7 terms the variances and covariances of the m’s are those of quantities 
M,— M, where the M, are independently distributed with variances 1/(WF,, + Wa) and М is 
the weighted mean of the M,. If the true values of the m’s are all zero, therefore, the 
weighted sum of squares 


(Wi, + Wap) mz 
will be distributed as y? with Ё— 1 d.f., apart from the disturbance due to the 7' terms. The 
weighted sum of squares has the expectation 
E(Wi,-- Wa) V(m,) = k— 1 + UZTXW,, + Wy). 


If the T terms are not sufficiently small to be neglected entirely the value of the second term 
may be deducted before making the y? test, but this correction will usually be small. Since 
the second term is always positive it need not in any case be calculated if significance i$ not 
attained before taking it into account. 


Е. YATES 395 


Residual y* 
The general formula for the residual д2 is the sum over all cells of 
(observed по. — ех no.)* 
expected no. — 


If, as in the present case, the data consist of a set of observed proportions, this reduces to 


» 


2ER‏ ہو (a— m^.‏ ر و 
m n—m‏ 
when ajn = p is the observed proportion, m/n = P = 1—Q the expected proportion given‏ 
by the maximum-likelihood solution, and S denotes summation over all proportions.‏ 
We may denote the provisional and working values and weights used for the sth estimates‏ 
by ¥,, y, and W,. The provisional values Y, will equal the expected values derived from the‏ 


(s — 1)th estimates. We have 
d 
Ys = ¥,+(p-P) (ip) 


Consequently the residual ү? based on the (s — 1)th estimates is given by 


ИРИ ee (5). 

(2). = Sn РО, dY], 
= Бу, -)*. (10) 
The residual y? obtained from the expected values derived from the sth estimates is 
es 62, = Ween- E: п!) 


If the estimation process is stopped after the sth estimates it will be necessary to calculate 
specially У,, 1, y,,4 and Wy to evaluate (x7),. Usually, however, (x2),1 Will be sufficiently 


accurate. 
Finney (1952) recommends the use of the ordinary formula for the sum of the squares of 


the residuals when regression coefficients or constants bj, bg, ... are fitted by least squares. 
b. SW- Y? = SWy? 5B, 5B; — -~> 

where B,, Bg, ... are the numerical terms of the normal equations with leading terms bs, Da, he 
A little consideration will show that if the sth estimates of the parameters and the numerical 
terms of the sth set of normal equations, together with y, and W, are used to calculate the 
right-hand side of the above equation, we shall obtain exactly 


SW, = Уы)". 


This, however, is not likely to give a substantially cl 
(10). The latter is easier to compute, and also has the 
values. 

When some of the expectations 
reliable. In particular, positive deviations 


oser approximation than expression 
advantage of revealing aberrant 


are very small it is well known that the x? test is un- 
from very small expectations will make exces- 


sively large contributions to the residual y?. Under such circumstances ite GAR has 
shown that 2S{a log, (a/m)}, tested against the ordinary x? distribution wit 2 e cusi ee 
number of degrees of freedom, gives a much better measure of the significance a n4 
deviations, and is well fitted to take the place of x? when the expectations are small. 


396 Use of transformations and maximum likelihood 


example of its use is given in example (c) below. This criterion requires somewhat more 
computation for its evaluation than ҳ?, since it cannot be simply derived from the trans- 
formed values. In our notation the contribution of each observed proportion is 


2n{p(log, p — log, P) + q(log, q —log, Q)}. 


ESTIMATION OF ERROR AND TESTS OF SIGNIFICANCE IN THE DROSOPHILA DATA 


The above formulae can now be applied to the Drosophila data. Taking the values of the 
weights used for deriving the second estimates (Table 4) we have 

V(t,—1) = 1/149-96 = 0-08172. 
Hence t,—t, = +0:2937 + 0-0817. Reference to the normal probability integral gives 
Р = 0-000163. Hence the significance of the difference is clearly established. If the actual 
mutation rates are of interest the standard errors of t, t, and }(t, + t) will also be required, 
These are given by 1 205-58 


ee у. МШЕ a E 2 
Vit) = 3596+ 115-96 x 620-62 ^ 070553", 


1 333-82 
V(t) = бзо-6 + 149-96 x 620-62 
l . (333-8—295-8)2 


Vk +t) = 990-6 2 149-90 x 650-03 


The test for variation in sensitivity between the experiments is equivalent to testing that 

the true values of m, are zero. We have 

X(Wy, + Wa) т? = 4-745, 
which has to be compared with y? with 5d.f. There is therefore no evidence of hetero- 
geneity between experiments. Taking the T, into account increases the expectation of the 
above expression by 0-0458. Clearly the 7. are of no consequence here. 

Using expression (10) and the provisional and working values of the second approximation 
gives a value of y? of 5-15. Since the maximum-likelihood solution is very close to the mini- 
mum д? solution, it is to be expected that this value of the residual x? will be slightly too 
large, and in fact the provisional and working values of the third approximation (equivalent 
to the second approximation estimates) gives a value of 5-11. The value based on the 
provisional and working values of the second approximation is clearly adequate for practical 
purposes. 


= 0-0588?, 


= 0۰03992. 


OTHER EXAMPLES 


As further examples, which bring out some additional points, we may take the three sets 
of data given by E. 8. Pearson (1950). 


, (a) Road research data 


Table 5 gives the number of cycles and motor-cycles involved in accidents causing 
personal injury on sections (6 to 15 miles long) of five main roads near London, together 
with estimates of the miles travelled by these two types of vehicle on these sections. A com- 
parison of the probabilities of being involved in an accident per vehicle mile is required. i 

Since a vehicle which is involved in an accident is either removed from the road or 18 
again exposed to risk of further accident the number of accidents will follow а Poisson 
distribution. If the probability of accident per vehicle mile is 0 the expected number of 
accidents per n vehicle miles will be n. Following the procedure of the previous example, 


—X—————————————.—‏ ———————— و ف اوه ا 


Е. ҮАТЕЗ 397 


we may denote the probability of accident for cycles and motor-cycles on road r by 4,9, 
and 4,4. This will make allowances for differences between the different roads which affect 
both probabilities proportionally. t 


Table 5. Road Research data 


| Cycles 
Road |- | 

| Accidents Miles/10* p | Accidents Miles/10° | 
1 5 1690 0-0030 4 800 | 0-0050 
2 2 1800 0-0011 2 500 0-0040 
3 3 1820 0-0016 1 330 | 0-0030 
4 4 1790 0-0022 3 570 0-0053 
5 3 1560 0-0019 5 540 

| 8660 


The Poisson distribution with mean m can be regarded as the limit of the binomial dis- 
tribution (q + р)", in which р->0 and noo in such a manner that np = m. Consequently, 
provided the unit of distance travelled is taken sufficiently small for the chance of two 
accidents occurring in the same unit to be negligible, the data can be treated as if they were 
binomial. Here a unit of 1000 vehicle miles is sufficiently small. As before the complementary 
log log transformation is appropriate. Since 

y = log, (—log,q)>log.p 
as р-> 0, a table of natural logarithms сап be used for the actual transformation. Working 
values and weighting coefficients can be obtained from Finney’s Table IX, which has 
adequate range for this purpose. Alternatively, if z is the number of occurrences а table of 
natural logarithms may be used with 
W = nw = пеї, 
y= Y—-1+2/W. 

With this procedure there is no need to choose units such that 2= æjn is small. 

The computations follow the same lines as those of the previous example, and need not 
be reproduced in detail here. For cycles on the first road, for instance, 

n= 1690, n'— 5, p=0-0030, log,p = —5'81, 


and we therefore have: 


Ist estimate 
2nd estimate 


the provisional value Y = — 6-0 being given by the first ши atras: Fo E 


estimate W = n’, since w = Р = p. 


398 Use of transformations and maximum likelihood 


The values of the first and second estimates of the parameters are shown in Table 6. The 
second set of estimates differs little from the first and is clearly sufficiently accurate, 

The average difference is clearly significant, motor-cycles being estimated as about 2-6 
times as liable as cycles to be involved in an accident per 1000 vehicle miles. Taking + 1-65 
times the standard error, we obtain approximate 5 %, lower and upper limits of error of 
this ratio as 1-5 and 4-7. 

The values of the various X? are shown in Table 7. There is no evidence of variation in risk 
between the different roads, or of any variation of the difference between cycles and motor- 
cycles. 


Table 6. Road Research data: lst and 2nd estimates 


m, +0-1420 +0-1304 
ms — 0-4930 — 0-4938 
т» —0-3671 — 0:3633 


— 0:0020 


+0:0117 


S.E. (4 — t) — + 0:356 +0-2423 


+0-2726 


D.F. ы 
Average difference (t, — tą) 1 7-41 
Variation between roads (m's) 4 2-12 
Variation in difference (residual) 4 1-40* 
9 10:93 


* From SW,(Y,—y,). SW,(Y,—y;) gives 1-38. 


That there is no appreciable variation between roads can in fact be seen by inspection of 
the values of p, and p, in the original data; such variation would result in a positive correla- 
tion between the values of p, and p,. This fact suggests a short cut to the whole analysis, 
since if there is no variation between roads or in the difference between cycles and motor- 
cycles the data for each type of vehicle may be pooled. A combined test for absence of such 
variation may be quickly made from the y? for the variation of the ratios for each vehicle 
separately. These ү? can be calculated in the ordinary manner and are found to be: 


x 
Cycles 4 1:67 
Motor-cycles 4 2-01 


F. YATES 399 


Using the pooled data we may estimate immediately that cycles and motor-cycles have 
risks of 0-00196 and 0-00548 per 1000 vehicle miles of being involved in an accident. These 
agree closely with the estimates of 0-00209 and 0-00553 given by the previous analysis, 

The possibility of pooling fragmentary data should always be examined. If pooling is 
justified the numerical work is considerably reduced and the presentation of the data is 
simplified. In addition, the amount of information is increased when heterogeneity between 
the different sets of data can be ignored. 

(b) Artificial insemination data 

These data (Table 8) are the result of an investigation (Rothschild, 1949) to test whether 
the passage of a small electric current for the measurement of the activity of the spermatozoa 
had any adverse effect on the fertility of bull semen. Seven samples were each divided into 
two parts, one being subjected to the electric current. The half-samples were then diluted 
and used to inseminate heifers. The columns headed ‘returns’ give the numbers of heifers 
which did not become pregnant, as judged by their being returned by their owners for 
reinsemination. 


Table 8. Artificial insemination data 


Untreated semen 


Sample 


So {Фф 2 


ча ي‎ QUA 1р = 


© 


Inspection of the individual р values gives no indication of any heterogeneity between 
samples. The uniformity of the data is confirmed by the ү? between samples for treated and 
untreated semen separately. The values are: 


402 Use of transformations and maximum likelihood 


The residual ү? is, however, unreliable in this set of data owing to the small expectations 
of the non-zero classes of explosive A (ignitions) in Exps. 4 and 5. The likelihood criterion 
— 28 {a log, (a/m)} has the value of 13-41, giving a significance level P = 0-020. This may be 
compared with the significance level obtained by combining the data into the two groups, 
namely, Exps. 4 and 5 and Exps. 1, 2, 3 and 6. (‘This particular combination is valid for the 
purpose of a test of significance, since 4 and 5 are the two experiments with low values of 
п т.) A 2x 2x 2 table then results. For the interaction of such a table, as Bartlett (1935) 
has shown, the exact probabilities of the various possible occurrences with all marginal 
totals fixed can be calculated in a similar manner to that followed in a 2 x 2 table. The 
resultant test is in fact a test for the constancy of the differences of the logits.* Bartlett’s 
test gives a probability of 0-0147 of two or more of the three ignitions in Exps. 4 and 5 being 
associated with explosive A. Bearing in mind (a) that practically the whole of the dis- 
crepancy is in this test concentrated in a single degree of freedom, but that, on the other hand 
(b) there is a large effect of discontinuity, not present in the likelihood criterion, the agree- 
ment between the two tests is satisfactory. 

The indication of the residual д? that these experiments are anomalous is therefore 
confirmed, though the significance level is much less than that given by the x? value. This 
result is not, however, of great consequence, since it merely provides evidence of the 
incorrectness of the supposition that the difference in logits between the two explosives is 
constant over the whole of the probability scale. Our conclusions from these data should 
be that explosive B produces significantly fewer ignitions than explosive A when conditions 
are such that the probability of ignition is substantial, but that there is no evidence of any _ 
difference when the probability is low. Substantially more data at low probabilities would 
be required, however, to reach any firm conclusions as to what is really happening in this 
region. 

SUMMARY 


The analysis of sets of quantal experiments with two treatments by the use of transforma- 
tions and maximum likelihood is discussed, and illustrated by examples, one of which is 
also treated by large-sample methods. Methods are given for (a) obtaining an estimate of 
the treatment difference in terms of the transformed variate, together with its standard ( 
error, (b) testing for differences in sensitivity between the different experiments, and (¢) 
testing for variation in the treatment difference from experiment to experiment. The 
appropriateness of various transformations under different circumstances is also considered. 


REFERENCES 


BARTLETT, M. S. (1935). J. R. Statist. Soc. Suppl. 2, 248. 

BONNIER, G. & LÜNING, К. G. (1953). Hereditas, Lund., 39, 193. 

COCHRAN, W. С. (1942). Iowa State Coll. J. Sci. 16, 421. 

Cocuran, W. G. (1952). Ann. Math. Statist. 23, 315. 

Cocuran, W. G. (1954). Biometrics, 10, 417. 

Dvxkx, G. V. & PATTERSON, Н. D. (1952). Biometrics, 8, 1. 

Finney, D. J. (1952). Statistical Method in Biological Assay. London: Griffin. 


FISHER, T A. (1932). Statistical Methods Jor Research Workers (4th ed.). Edinburgh: Oliver and 
Boyd. 


* Simpson (1951), following a suggestion by Bartlett, has drawn attention to certain errors in the 
latter's paper, which do not, however, affect the test for interaction. Lancaster (1951) has criticized 
Bartlett's test on what seems to me the invalid ground that the components of ү? associated with this 
test are not additive. 


Е. YATES | 403 


FISHER, К. А. (1950). Biometrics, 6, 17. 

FISHER, R. А. & YATES, Е. (1938). Statistical Tables for Biological, Agricultural and Medical Research 
(4th ed. 1953). Edinburgh: Oliver and Boyd. 

Jory, б. М. (1950). Ann. Appl. Biol. 37, 597. 

Lancaster, Н. О. (1949). Biometrika, 36, 370. 

LANCASTER, Н. О. (1951). J. R. Statist. Soc. B, 13, 242. 

Pearson, E. S. (1950). Biometrika, 37, 383. 

Pearson, К. (1933). Biometrika, 25, 379. 

ROTHSCHILD, LORD (1949). J. Agric. Sci. 39, 294. 

5тмрзох, E. Н. (1951). J. Е. Statist. Soc. B, 13, 238. 

Yates, Е. (1953). Sampling Methods for Censuses and Surveys (2nd ed.). London: Griffin. 

Yares, Е. (1955). Biometrika, 42, 404. 

Yargs, F. & HALE, В. W. (1939). J. R. Statist. Soc. Suppl. 6, 67. 


[ 404 ] 


A NOTE ON THE APPLICATION OF THE COMBINATION OF 
PROBABILITIES TEST TO A SET OF 2x2 TABLES 


By Е. YATES, F.R.S. 
Rothamsted Experimental Station 


INTRODUCTION 


The combination of probabilities test, or probability integral test, as it is also called, has 
been commonly used to give an overall test of significance for a set of 2 x 2 contingency 
tables for which pooling is not regarded as admissible because of heterogeneity of the data. 
Data of this type can, in my opinion, be more effectively analysed by the method of maxi- _ 
mum likelihood, using some appropriate transformation; the methods of analysis are 
described in a separate paper (Yates, 1955). Cases will, however, arise where a quick, 
possibly preliminary, test of significance is required, and for this purpose the combination 
of probabilities test or some analogous test may be regarded as adequate. It therefore seems. 
worth putting on record certain points concerning tests of this type. 

The combination of probabilities test requires the calculation of a significance level P 
for each of the k tables separately. The test criterion S (—2log, P) is then compared with the 
X^ distribution with 2k degrees of freedom. We usually require to test whether the data 
collectively show evidence of deviation in a common direction, and in such cases the signi- 
ficance level P must be that of the appropriate single tail of the corresponding distribution. 

The test is exact if the basic P distributions are continuous with uniform probability, ~ 
i.e. if each P can be regarded as uniformly distributed between 0 and 1 when the hypothesis 
to be tested holds. If the P's are derived from 2 x 2 contingency tables with small marginal 
totals, however, the distributions of P are by no means continuous. As has been pointed out 
by Pearson (1950) they can be made continuous with uniform probability by the following 
device. Tf for any particular 2 x 2 table the probability. derived from the hypergeometrie 
series is P,(H) that the number in a particular cellis >a, and the probability of this number 
being >(a+1) is P,,,(H), then if a is the number observed we may use a value P,,(H) 
instead of P(H), where Р, (Н) is a random selection with uniform probability from 
values in the range P,(H) to P,,,(H). 

Since all the values of P will be decreased by applying this adjustment, the value of 
S(— 2log, P) will be increased. Without the adjustment, therefore, we shall consistently 
over-estimate the combined probability of obtaining a set of P's whose produot is as small 
às or smaller than that observed. This confirms the result obtained by Lancaster (1949) in 
the course of a detailed investigation on the combination of probabilities from discrete 
distributions. 

To overcome this difficulty Lancaster suggested that the average value of — 2log, Р, (Н) 
be used instead of the value — 2 log, Р,(Н). This average value he denoted by д2,, but to avoid 
confusion with the various y? with 1 d.f. appertaining to the individual experiments, We 
will here denote it by — 2log, P,,(H). We have 


— 2log, Fa+(H) = x, = E( — 2log, P, (H)) 
= 2—2(F,(H) log, P(H) — P,..,(H) log, P,.(H)}/{P,(H) — Р,,1(Н)). 


Е. YATES 405 


Asa further alternative, to save computation, Lancaster suggested what he called the median 
value 32, which he defined by е 

X» = — 2106, HP,(H) + Fa (Н)) Fa,s(H)+0, 

= 2- 2log, РДН), P,„(H)=0. 

Neither of Lancaster's procedures, of course, gives an exact ү? distribution with 2kd.f., 
but the departures will be very slight in all cases likely to be met with in practice. By 
Pearson’s procedure an exact, ү? distribution сап be generated, but this introduces an 
arbitrary element which, as Pearson himself recognized, may be regarded as altogether too 
high a price to pay for a formally exact test. 


USE or y? TO DETERMINE THE VALUES OF P 


If the number of terms in the hypergeometric series appertaining to a single experiment is 
at all large the calculation of the exact probabilities is a tedious matter. An alternative 
procedure is to calculate the values of ҳ® for the individual experiments without the cor- 
rection for continuity, and use these to determine the corresponding one-tail probabilities. 
Cochran (1942, 1952), correcting a misleading statement by myself (1934), pointed out that 
the continuity correction should not be applied when summing д? from а number of 2 x 2 
tables, though the reason he gives in his 1952 paper, that ‘the correction has a tendency to 
over-correct’ and that ‘the over-correction mounts up in a disconcerting manner’ is not, 
as we have seen above, the basic one. Lancaster discussed the use of x? without correction for 
continuity for the combination of probabilities test, but confined his attention to the calcula- 
tion of the two-tail probabilities. Unfortunately, as already mentioned, this is not what is 
usually required in practice, and Lancaster’s argument is not applicable to the o 7 case. 

The question of the accuracy of the x? approximation, therefore, still requires i vestiga- 
tion. When д? corrected for continuity, x2 say, gives a good approximation to the one-tail 
probabilities, it may be expected that the use of x? without correction for continuity, x say, 
for the calculation of the probabilities for combination will give a reasonable approximation 
to the test based on the exact probabilities and — 2log, P,,(H) as defined above. For if t 
P.(x,) is the one-tail probability given by у, when the value а is observed, and P,(x,) is 
the corresponding probability given by x2, then P,(Xu) is intermediate between P,(x,) and 


P, (x), and will not differ greatly from Р(Х). Fer 
It is well known, however, that if the relevant hypergeometric distribution is very asym- 
metrical P,(Xe) is not à good approximation to P (H) (Yates, 1934). Consequently, if the 


d he hypergeometric distributions for the separate tables are very 
ма aro o o the same sense for all tables, the use 


asymmetrical, and particularly if the asymmetry is in se f 
of y? without correction for continuity instead of the exact probabilities may be expected 


to give misleading results. MD 
Some comparisons made for actual 2 x 2 tables, however, are reassuring. ables having 


the following marginal totals (treatments 1 and 2) were taken: 


Table A Table B Table C 
1 2 1 2 1 2 * 
+ a 8 + @ 8 + а 
= 192 - 142 = | 924 
100 100 200 50 100 150 40 200 240 


406 Combination of probabilities test to a set of 2 х 2 tables 


Table A gives a symmetrical distribution and Tables B and C give increasing degrees of 
asymmetry. (No particular merit is claimed for these tables; A was originally chosen for 
another purpose, and B and C were taken to give distributions differing from A mainly in 
degree of symmetry.) 

For these three tables the values of —2log, P,,(H) and —2log, P,(y,,) were calculated 
for all possible values of a. For Tables B and C there are two sets of values depending on the 
direction in which the deviations are measured. The results are shown in Table 1. In 
distribution A all the individual values except that for a = 8 (which will only occur in 1 out 


Table 1. Comparison of —2log, P,,(H) and — 2log, P,(y,,) 


Distribution A 
Test for 1> 2 x 
ә 
" 9 log, PG) -2 log, P) 
$ 
0 0-003377 0-004 0-004. 
1 - 0-029052 0-036 0-030 
2 0-107092 0-181 0-155 
* 3 0-220948 0-583 0-536 
4 0-279062 1-413 1:386 
5 0-220948 2-842 2-894 
6 0-107092 5-055 5-195 
1 0-029052 8-332 8-382 
v 8 0-003377 13-383 12-484 
1-000000 2-000 2-004 
———— UR 


Test for 1> 2 Ї "Test for 1<2 


— 2 log, | —2 log, 
Pa (H) Pa ats) 


—2 log, P,,(H) | —21og, Pa(x,) 


0-035397 0-036 0-040 8-682 
0-152244 0-239 0:210 
0:277764 0:805 0-724 
0-280688 1:905 1:839 
0۰171775 3-662 3۰767 
0-005168 6-178 6-646 
0-014962 9-581 10-56 
0-001900 14:11 15-56 
0-000102 20:38 21:67 


0 
1 
2 
3 
4 
5 
6 
7 
8 


1۰000000 2۰000 2۰021 


F. YATES 407 


Table 1. (cont.) 
Distribution C 


Test for 1»2 
a р(а) 
| 
—2 log, Р.Н) | —2log, Р(Х) | 
0 0-048696 0-049 | 0-065 - 
1 0-168403 0-288 0-264 4:190 
2 0-264922 0-874 0-776 2:152 
3 0-251227 1-907 1-791 1-011 
4 0-160692 3-426 3-460 0-416 
5 0-073459 5-442 5-890 0:144 
6 0-024809 1:955 9148 0-041 
^ 0-006309 10-96 æ 13:27 0-009 
8 0-001220 14-46 18:29 0-002 
9 0-000180 18:47 24:23 A. 
10 0-000020 A ae 
11 0-000002 
0-999999 


of 300 experiments) agree to within 0-15, and the expectation of — 2log, Pa(Xu) is almost 
exactly 2. Distributions В and C show considerably greater individual differences, as is to 
be expected, but the large differences are again confined to the tails, where they are of no 
great consequence; for if values occur in the tail in any substantial proportion of the experi- 
ments the significance of the combined results will not in any case be in doubt. The expecta- 
tions deviate in opposite directions for the two types of test but are still very close to 2. 
On this last point it may be noted that Lancaster defines (without comment) what he 


calls the crude x? of a 2x 2 table as 
(0-1) (00—60) — 
Т) 


The customary form, which has been used in this paper for x2, has the factor N instead of 
aster's form has the property, which is useful for the com- 


N —1 in the numerator. Lane s th г a. 
bination of two-tail tests, that the expectation 18 1. This, of course, does not make the 


expectation of —2log, Р(Х) equal to 2, but it does in fact bring it closer to 2 in the case of 
symmetrical distributions. Tt can do little, however, to improve the expectations for 
asymmetrical distributions, since the deviations are of opposite sign for the two types of 


test. ves к 

From the above comparisons we шау conclude that the determination of the one-tail 
probabilities from x? without correction for continuity will be satisfactory in cases likely 
to be met with in practice, even when the expectations in the individual experiments are 
quite small, 


Biom, 42 
27 


408 Combination of probabilities test to a set of 2 х 2 tables 


VARIANTS OF THE TEST 


The use of values of x? for 2d.f. for the combination of probabilities is to a certain extent 
arbitrary. It has the convenience that the values are easily calculated, and the use of a 
function of the product of the probabilities has a certain intuitive appeal, but the method 
would work equally well with other basic numbers of degrees of freedom. If, for instance, 
the values of д? for 1 d.f. corresponding to the P’s are summed then in the absence of associa- 
tion the sum will be distributed as x? for kd.f. Equally—and in the type of data we are 
considering this procedure requires even less computation—the values of y, in the absence 
of association approximate to normal deviates with zero mean and unit standard deviation, 
and their sum is therefore a normal deviate with a standard deviation of Е (see, for example, 
Cochran, 1954). 

The calculations for these variants for the Drosophila data analysed in the paper referred 
to (Yates, 1955) are exhibited in Table 2. The combination of probabilities test (column 4) 
gives a significance level (one tail) of Р = 0-000610, whereas the use of y? for 1 d.f. instead 
of 2 d.f. (column 5) gives Р = 0-000760, and S(y,,) (column 2) gives P = 0-000673. 


Table 2. Calculation of significance levels for the Drosophila data 


% Р(х.) — 2105, Р(х) | ХРА) 
(1) (2) (3) (4) (5) 
0-4448 0-667 0-2524 2-753 1:309 
4:2274 2-056 0-01989 7:835 5:420 
8:0065 2-830 0-002327 12-126 9.272 
0-0418 — 0-204 0-5808 1:087 0-305 
0-0938 0-306 0-3798 1-936 0-771 
4-8325 2-198 0-01397 8:542 6-042 
17-6468 7-853 = 34-279 23:119 

0-00718 == 0-000610 0-000760 


These values are in close agreement. Indeed, except when the data from the different 
experiments are mutually contradictory no great differences are to be expected between the 
different possible methods that suggest themselves, since they merely demarcate equi- 
probable contour surfaces in the multiple P space which are not markedly different. Never- 
theless, alternative tests which are merely variants based on the same information are 
likely to give rise to confusion, and it is therefore recommended that some standard pro- 
cedure is adopted whenever a test based on the levels of significance in the individual 
experimenta or sets of observations is required. If the information presented to the statis- 
tician consists of the relevant probabilities the combination of probabilities test commends 
itself both on historical grounds and because of its simplicity and intuitive appeal. If a 
one-tail test of a set of 2 x 2 tables based on the X? values for each table is required, however, 
there seems little point in transforming the values of X, into probabilities and then re- 


transforming these into y? values. It is therefore recommended that in such cases S(Xu) 
be adopted as the standard test criterion. 


Е. ҮАТЕЗ 409 


In contrast with the above values of P it may be noted that S(32) (column 1), which in the 
absence of association would also be approximately distributed as д? with 6d.f., gives а 
value of Р = 0-00718. This considerably larger value of P is to be expected, since the 
direction of the deviations is not taken into account. 


EFFICIENCY OF THE TEST 


The significance levels given by the combination of probabilities test and by the ratio of 
t, — t, to its standard error in the maximum.-likelihood solution for the four examples of the 
other paper (Yates, 1955) are as follows: 


* Computed from the values of S(32) given by Pearson (1950). 


In all three cases in which significance is attained the maximum-likelihood solution gives 
a higher level of significance than the combination of probabilities test. In two cases the 
difference is substantial. This suggests that the combination of probabilities test is not very 
efficient, though the differences may in part be due to chance causes or to inadmissible 
approximations in one or both the tests. 

Laneaster compared the power of the combination of probabilities test, the direct sum- 
mation of y? test, and the test derived from the pooled data by numerical methods, taking 
the case of the binomial distribution (g + py with p = 0-5 as the null hypothesis and enumer- 
ating all possible events. He made a similar comparison of the first two tests and some 
variants on a 2 х 2 table. Unfortunately, however, he used two-sided tests. This consider- 
ably reduces the efficiency of the combination of probabilities or summation of x? tests, 
and his estimates of the relative power of the three tests are therefore not really relevant to 
the situations met with in tice. І 

A full investigation of A of the combination of probabilities test is beyond the 
scope of this note, but it is to be expected on general grounds that the combination of 
probabilities test will be somewhat inefficient. In the first place no direct cognizance is 
taken of the number of observations and other factors that affect the amount of information 


given by the different experiments. It is true that the more accurate experiments will, on 


the average, yield results of greater significance when there is a real difference. But if, for 


example, of two experiments А and В, A yields 10 times the amount of information given 
by B, we should naturally be inclined to give more weight to a significant result from ig 
ment А than to one from experiment B. But on the combination of probabilities test the 


i ill gi i 1 of 
result P, = 0-025, Pj = 0:3, for instance, will give exaetly the same combined level o 


ignifi 5 % single tail), as P, = 0:3, Pg = 0:025. Cin 
m eir is even if all the experiments yield the same amount of information this 


information is not in general combined efficiently by the combination of pulito test. 
-2 


410 Combination of probabilities test to a set of 2 х 2 tables 


As a simple example we may consider the case of a set of quantitative experiments which 
furnish normally distributed estimates £}, £a, ...,2;, of a constant treatment difference А, 
all having the same (known) variance o*. The efficient (indeed sufficient) combined estimate 
of д will then be z, and this will be normally distributed with known variance o?/k and сап 
therefore be used to give an exact overall P, Р, say. The exact level of significance can also 
be calculated for each experiment separately, and these levels can then be combined to 
give an overall P, Ре say. It is easily seen that in particular cases Pa may differ considerably 
from Pr. 


Table 3. Comparison of significance levels given by maximum likelihood (Pj) 
and combination of probabilities (Po) in a sampling experiment 


P, P, log;o (Po/P;) 

0-839 0-820 

0-0808 0-0955 0-073 

0-0764 0-0460 — 0-220 

0-0694 0-0908 0-117 

0-0559 0-0409 — 0-136 
0-659 0-0548 0-0296 — 0:267 
0-641 0-0495 0-0512 0-015 
0-618 0-0436 0-0838 0-284 
0-540 0-0281 0-0145 — 0-287 
0-480 0-0197 0-0367 0-270 
0-405 0-0122 0:0318 0-416 
0:405 0۰0122 0-0132 0-034 
0-334 0-00734 0-0216 0-469 
0-258 0-00391 0-0100 0-408 
0-233 0-00307 0-00394 0-108 
0-142 0-00104 0-00118 0-055 
0-131 0-000874 0-00185 0-326 
0:111 0-000619 0-00287 0-666 
0-090 0-000404 0۰00132 0-514 

0-0000784 0-0000973 


A test of the relative performance of the two tests was made for a specific example of this 
type with k = 5, с = 1, and a true treatment difference of 0:9. With these values Z will 
attain significance at the 2-2 9/ level (single tail) in 50 °% of all sets of five experiments. The 
actual levels attained by z and by the combination of probabilities test were calculated for 
twenty such sets, using random normal deviates nos. 6551-6650, each increased by 0-9, 
given by Wold (1948). The results are shown in Table 3. z — 0-9 will be normally distributed 
about zero mean with standard deviation 1 15. The probability P(z — 0-9) of getting a value 
of z — 0-9 greater than the observed value was calculated for each set, and the results have 
been arranged in order of magnitude of this probability. With a large number of sets there 
would be a uniform distribution of P(z — 0-9) over the range 0-1. 

The results show clearly that P, tends to be less than Ру, and that the difference is greater 
in the experiments with &mall P(z — 0-9) which, of course, also have small P,. The average 


Е. Yates 411 


value of log; Po/P is 0-146. In other words, the significance levels are on the average in 
the ratio of 1-40: 1. There is thus a clear loss of efficiency with Po. Equally important, 
there are considerable discrepancies in individual sets of experiments, leading to a conflict 
of evidence from the same body of data which arises solely from the use of an inefficient test. 


SUMMARY 


It is shown that y? without correction for continuity will give one-tail probabilities for 2x 2 
tables which may be safely combined in most cases likely to be met with in practice. The 
summation of the corresponding signed values of x gives a rapid method of combination. 
Reasons are given for believing that combination of probabilities tests are not likely to be 
very efficient, and this conclusion is demonstrated by a small sampling experiment. 


REFERENCES 


Cocuran, W. G. (1942). Iowa State Coll. J. Sci. 16, 421. 

Cocuran, W. G. (1952). Ann. Math. Statist. 23, 315. 

Соснвах, W. G. (1954). Biometrics, 10, 417. 

LANCASTER, Н. О. (1949). Biometrika, 36, 370. 

PEARSON, E. S. (1950). Biometrika, 37, 383. 

Worn, H. (1948). Random Normal Deviates. Tracts for computers, no. XXv. 
Cambridge University Press. 

Yarns, Е. (1934). J. R. Statist. Soc. Suppl. 1, 217. 

Yarss, Е. (1955). Biometrika, 42, 382. 


[ 412 ] 


A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS 
IN A TWO-WAY CLASSIFICATION 


By ALAN STUART 
Division of Research Techniques, London School of Economics 


1. INTRODUCTION 


There are several circumstances in which we may wish to test the homogeneity of the two 
sets of marginal probabilities in a two-way classification. For example, a sample from a 
bivariate distribution (say height of father, height of son) may be classified into a two-way 
table with identical (height) groupings in each margin. Or a similar classification may be 
possible for a non-measurable variable (say strength of right hand — strength of left hand). 
Again, in surveys of the same sample (a ‘panel’) on two different occasions, the inter- 
relation of the results on the two occasions may be displayed in a two-way table, with one 
margin corresponding to each occasion. In all these cases, the question may arise: are the 
two sets of marginal probabilities identical? 

If the variable is measurable, we may test the difference between the means of the two 
marginal distributions by a large-sample standard-error test. However, we may be in- 
terested in the overall distributions, rather than only in their means. For the more 
stringent hypothesis of homogeneity, a test exists if we have two completely independent 
samples, when an ordinary x? test of homogeneity may be applied (Cramer, 1946, p. 445). 
This test does not meet the essentially bivariate situations described above, where 
non-independence of the marginal distributions is a fundamental feature of the 
problem. 

When the classification is a double dichotomy, the problem of testing marginal homo- 
geneity is simple, and its solution is а special case of the large-sample solution of the more 
general 2F classification problem given by Cochran (1950). Bowker (1948) gave a large- 
sample test for complete symmetry in a two-way classification, a more restrictive hypo- 
thesis which is concerned with the entire set of probabilities in the classification, and not 
only with the marginal probabilities as we are here. In the present paper, a large-sample 
test for marginal homogeneity is derived and illustrated. 


2. VARIANCES AND COVARIANCES 


With the usual notation for an m x m table, we denote the probability of falling into the ith 
row and jth column by p;;, and define the marginal probabilities by 


Y= Ур» Э.з; = Уру 
while obviously VEPs = Ур. = Ур; = 1, 


The corresponding sample numbers are denoted by луу, ng., .,, while the total sample size 
is n. 


f- - = ر‎ 
= Er ЕЕЕ 


ALAN STUART 413 


By standard multinomial theory, we have for the means, variances and co-variances 
of the ny 


Ving) = npu- Py) ü) 


E(ny) = тр, 
C(ng,ny) = —npapy (+l and/or ks), | 

and also En) = лр, 

Ving) = n»n(l-2n (2) 

O(n, nj) =-np Py. (63), 

C(n,n,- —npp, (F3). 
We now require Oln; sn) = Ving) + C (Man m) 

1 


where ¿+l and/or k+j. From (1), this is 
C(n, ,n.j) = N Pyll- Pg) - E PaP) 
= тру Р.Р.) (3) 
on taking the term in pj, into the summation, We now define the statistics 
d,-n,-n, (= 1,2,...,7). 
Since the likelihood-ratio principle yields an intractable result in this case, it seems natural 
to use the d; as the basis for a test concerning the differences between the corresponding 


marginal probabilities. For their means, variances and co-variances, we have, from (2) 
and (3), the exact results: 


E(d;) = E(n;, —) = n(p;.— P.) (4) 
та) = Р) + Ving) = 2€(n;., n.i) 
= n((p,.- 9.7203) = (Pi. — 2.) (5) 


Also, for ї +7 
C(d;,d;) = Ст. т.) +С nj) Ст; n i) + C(n.; n; )} 


= — (Diy +P) + (p;.—2.0 (P3. -p4J (6) 
3. TESTING THE HYPOTHESIS OF HOMOGENEITY 
Suppose that we wish to test the hypothesis 
Н: (p.-247 М (i = 1 to т), 
n 
where, of course, we must have Pu =0. 
(4), (5) and (6) then become 
(7) 


V(d|H) = (Pet Pa WPu) — А, 


E(d|H) "5e | 
O(d;,4; |H) = =n ( pig + Pit) t МА (+), 


414 Homogeneity of marginal distributions 
and in the special case of particular interest here, where we have 
Hy; A,=0 (alli), 
(7) becomes E(d,| Hy) = 0, 
Va = V(d;i| H) = n( Pi. +р.;– Pi) (8) 
V; = О(4„4,| Hy) = —^(pg--py) (+3). 
We have deliberately refrained from replacing p ; by р; in V;;, as we could have done, 
since if, as will generally be the case, the true probabilities are unknown, the maximum- 
likelihood estimators of the variances and co-variances in (8) are 


f; = ihe (9) 


Vig = — (пт). 
It is well known that the т; have a limiting multinormal distribution with moments given 
by (1). It follows that the »;. and n., will also be jointly normally distributed, since they are 
linear functions of the n,;, and finally, by the same argument, that the variates d; (i = 1 
to m) will be asymptotically multinormally distributed with moments given by (7) in 
general, and by (8) on the null hypothesis. To a further degree of approximation, the same 
result holds if (9) is used as an estimator of (8). 
Now it із a standard result that in the exponent, say — 0, of a multinormal distribution, 
Q is distributed like x? with degrees of freedom equal in number to the rank of the dis- 
tribution. Furthermore, Q in this exponent is simply a quadratic form in the variables. 
Thus, on Ah, P1 
07 E ott w 


is distributed in the limit like д? with (m— 1) degrees of freedom, the rank of the distribution 
n 

being (m — 1) Minos 2 d, = 0, and in general this is the only constraint on the d;. If the dis- 
-1 


tribution were non-singular, the coefficients а, would simply be the elements of the inverse 
of the variance-covariance matrix (Vj). As a result of the singularity, (V;;) itself is also 
singular, and so we cannot invert it. 

However, any marginal distribution of a multinormal distribution is itself normal, 80 
that we may eliminate the redundant variate (say the last) and obtain the result that 


-1 
Q= "У Vid,d, (11) 
ij-l 


is distributed in the limit like д? with (m — 1) degrees of freedom, (m — 1) still being the rank 
of the distribution. If we replace the V in (11) by their maximum-likelihood estimators 
given by (9), the same asymptotic result holds. In (11), (Vj) is now understood to be defined 
for i,j = 1, to m — 1 only, and (V*/) is its inverse. 

The fact that (11) takes no explicit account of d,, lends it an appearance of arbitrariness, 
but although the values of the terms in Q are changed by eliminating some other d; instead 
of dm, their sum Q is uniquely determined. This is in virtue of the fact that since any (m— 1) 
of the d; uniquely determine the other опе, Q must be, apart from constants, the log likeli- 
hood of the complete set of d;, irrespective of which of the d; is omitted in Q. In point of 
fact, Q could be expressed as a function of all the m values of d; and the complete matrix 


ALAN STUART 415 


(Vy), but this merely complicates the computational procedure and makes no difference 
to the result. 

Finally, we must determine the appropriate critical region of the distribution of Q. 
(7) shows that, when Æ is not true, the variance-covariance matrix of the d, may be written 


(Vis) = (юу), 
where (шу) is independent of n. The inverse matrix is therefore 


(V) = 1000), 
Thus the expected value of Q in (11) is 


m-l 
E(QH) = X Bud) 


S I. wi (C(d,, dj) + E(d;) E(dj)), 


so that, using (7), E(Q | Н,) = O(n). (12) 
On the other hand, the limiting x* distribution of Q on H, gives us 
E(Q| Hy) = m-1, (13) 


independent of n. Thus, with increasing 7, the difference between (12) and (13) will exceed 
any bound, and the appro iate critical region for our large-sample test is the upper tail 
of the distribution of Q. The same conclusion holds if (V;;) is replaced by its maximum- 


likelihood estimator Ê. 


4. COMPUTATIONAL PROCEDURE: AN EXAMPLE 
We now set out the computation necessary for the test, and give an illustration of its use. 
(1) Form the matrix (Й,) given by (9), for i j = 1to (m-1). 
(2) Invert (Й,) to obtain (Vii). 
(3) Compute Q -X. Pild,d,, where di = т." 
(m — 1) degrees of freedom, the critical region being the upper tail. 


Consider the following example, which uses data quoted by Stuart (1953). 


and test in the 3? distribution with 


7477 women aged 30-39 ; unaided distance vision 


X Left eye 
Right eye ~ 
| 
Highest grade 
Second grade 
Third grade 

Lowest grade 


Total 


We have .d, = 1976 — 1907 = 69, 
d, = 2256 — 2222 = 34, 
d, = 2456 — 2507 = — 51, 
and we check that d, = 789—841 = — 52 equals the sum (d, +d, 4- d;), negatively signed. 


416 Homogeneity of marginal distributions | 


Since d, and d, are positive, while d; and d, are negative, one may ask whether the sight of 
the right eye may really be regarded as better than that of the left eye for the population 
from which this is a sample. The estimated variance matrix of d}, da, d is obtained by use 
of (9), as Й = (266 + 124 + 66) + (234+ 117+36) = 843, 


Й. = 0, = — (234+ 266) = — 500, 


Я 843 —500 —241 
and so on, giving (V) = | —500 1454 m : 
—9341 —794 1419 
The inverse is obtained directly as 
2482 1560 1295 
(fu) = ono 1972 m). 
1295 1368 1690 


and we have, for our x? statistic with 3 degrees of freedom, 


Q = 10-*((2482 x 692) + (1972 x 342) + (1690 x 512) + 2(1560 x 69 x 34) 
— 2(1295 x 69 x 51) — 2(1368 х 34 x 51)} 
= 11-96. 
The 1 % point for the x? distribution with 3 degrees of freedom is 11-345, so we conclude that 
the result is significant of a difference in the distribution of sight in right and left eye. 
If, instead, of eliminating d, we had eliminated, say, d, we should have obtained as above: 


К 1454 —794 —160 
(£5 = | —194 1419 E 
—160 —384 646 


R 1332 995 921 
(V4) = 10-6] 995 1583 1187), 
921 1187 2482 
and finally 
Q = 10-*((1332 x 34?) + (1583 x 512) + (2482 x 522) + 2(1187 x 51 x 52) 


— 2(995 x 34 x 51) – 2(921 x 34 x 52) 
= 11:96 
as before. 


I am grateful to Mr J. Durbin for several illuminating discussions of this problem. 


REFERENCES 


Bor Н. (1948). A test for symmetry in contingency tables. J. Amer. Statist. Ass. 43, 

572-4. 

COCHRAN, W. G. (1950). The comparison of percentages in matched samples. Biometrika, 37, 256-66. 

Cramer, HARALD (1946). Mathematical Methods of Statistics. Princeton University Press. 

STUART, A. (1953). The estimation and comparison of strengths of association in contingency tables. 
Biometrika, 40, 105-10. 


>> 


[ 417 ] 


DISTRIBUTIONS OF KENDALL'S TAU BASED ON PARTIALLY 
ORDERED SYSTEMS 


By SOL HABERMAN 
University of Minnesota 


INTRODUCTION 


If a pair of objects a, b are connected by a relation R which is non-reflexive, anti-symmetric 
and transitive, we say, if aRb, that a precedes b. Suppose we have а set of п objects in which 
R holds between some, not necessarily all, objects. The situation may be represented dia- 
grammatically by a set of points, one to each object, joined by lines between each pair of 
objects which are connected by R. The relation ab between any pair exists, if any, only if 
there is a path along the network joining the corresponding points. We may represent the 


For convenience, in this paper, we shall regard all our directions as going downwards (from 
the head of the page towards the foot), and it will not be necessary actually to draw the 


arrows. 
On this understanding, any downward path in the diagram corresponds to an ordering 


of the objects through which it passes. Now suppose that each of the п objects exhibits one 
if there are two variables, 


of the ranks of each of a set of ranked variables. For example, 
A, a dichotomy into male and female (ranked 1, 2), and B, a fourfold division into social 
class, working, lower-middle, upper-middle and upper (ranked 1, 2, 3, 4), any individual 
under consideration will bear one of the ranks for each of A and B; the ranking (23), for 
instance, denoting a female in the upper middle class. In this particular case there are 
2x 4 = 8 types of individual and the number n of objects is accordingly 8. 
If comparisons are made between neighbouring members differing only in one variable, 
beginning with (1, 1) and ending with (2, 4) we have what is called a partial ordering. Thus 
(11), (12), (13), (23), (24) issuch an ordering, and so is (11), (21), (22), (23), (24). The situation 
is illustrated in Fig. 1, both partial orderings corresponding to downward paths on the 
diagram. Generally, if the objects correspond to rankings of ру, s +++ Pr ® partial ordering 
will be a set of p,+Pat---+Pr-7* 1 objects such that any consecutive pair is of the type 
(а, b, -j> <-7), (5b, sujet) The number n is equal сары) 
-r + Ў 

There will ben! rankings of the n objectsand o, TI р, — 1)! ... (р,—1)! ji TOES] pe оз ait 
beginning with (111... 1) and ending with (pias --- Pr) For in the process of following 
through any partial ordering any rank, say the jth, has to make p; — 1 unit moves and these 
can occur at any point in the Z(p; — 1) moves. Thus with p, = 2, 2» = 4 there are four partial 


kr ш) Gn) e» e» G9 
(ш) a2) 929 (23) (20) (1) 
(11) (12) (3) (23) (24) 
an (19) (13) 04 (24)). 


With р, = 2, p, = 2, p, = 2 there are six partial orderings each comprising four objects; 
and so on. The totality of partial orderings may be called the set of such orderings, 
Consider a ranking of the n objects, e.g. for 2 x 4 objects, 


(11) (12) (13) (14) (21) (22) (23) (24). (2) 


We see that every pair of objects in the partial orderings (1) are in the same order as in the 
ranking (1). Such a ranking is said to be consistent with the set of partial orderings. It is 
not unique, e.g. 

(11) (12) (13) (21) (14) (22) (23) (24) (3) 


has the same property. Any other ranking may be compared with one of the consistent 
rankings by considering the minimum number of interchanges of adjacent pairs necessary 
to transform one to the other. More than one consistent ranking may be reached by the 
same minimum number of interchanges, 


11 


418 Distributions of Kendall’s tau based on partially ordered systems | 
| 


14 
21 


24 
Fig. 1. Partially ordered system for two variables, one dichotomous and the other fourfold. 


The problem discussed in this paper may be formulated as follows: A set of partial 
orderings is given, and a ranking of the objects is observed. Can this be regarded as con- 
sistent with the partial orderings? If the observed ranking is not subject to error, of course, 
this question is simply decided by examining whether it is one of the consistent rankings. 
We may also, however, allow for error in the observed ranking and regard it as consistent 
in a probabilistic sense if it is close to a consistent ranking, ‘closeness’ in this sense being 
measured by the minimum number of interchanges of adjacent pairs necessary to transform 
the observed to a consistent ranking. 

This number of interchanges will be denoted by s, and our object is to find the frequency 
distribution of s in the population of л! possible rankings of the objects. Thus, the hypo- 
thesis that an observed ranking is consistent with a set of partial orderings will be accepted 
if s is small; or, equivalently, if the probability of the observed в or a lower value is below 
some assigned significance level. The procedure is similar to Kendall’s use of s to measure 
the agreement between two rankings (cf. Kendall, 1955). The difference in our case is that 


SoL HABERMAN 419 


we require the agreement between an observed ranking and one of a number of possible 
consistent rankings, not a unique one. Kendall's r is defined in terms ofa by 

ds 
-im-Iy (4) 
but for our purposes it will be sufficient, and more convenient, to work with s itself. 


T-1 


DISTRIBUTION OF 8 FOR THREE DICHOTOMOUS VARIABLES 
I consider in the first place three dichotomous variables (py 7 P3 7 2, 72). There are, as 


noted above, six partial orderings of four. The corresponding diagram is given in Fig. 2. 
The 8 objects give rise to 8! = 40,320 rankings. Of these 48 are consistent. 


Fig. 2. Partially ordered system for three dichotomous variables. 


This last result may be seen as follows: Since (111) comes first and (222) last we need only 
consider the other six objects. The six partial orderings are (112) (122); (112, 212); (121, 122); 
(121, 221); (211, 221); (211, 212). These do not impose any conditions on the order of the 
three (112), (121), (211) or of (122, 212, 221). But any member of the first precedes two 
members of the second. The first three must then occupy three of the first four places. It 
will be seen that if there is no overlap between members of the two sets the number of 
arrangements is 3! x 3! = 36. The only possible overlap occurs when one member of the 
first triad follows a member of the second in the fifth place. This gives rise to 12 further 

ossibiliti ing 48 in all. 
a y Rabin by A; each of (211), (121), (112) by В; each of (221), (212), (122) 
a eee A and D can occur in 56 different ways, which we can classify into 


14 categories as follows: 


(1) (2) (3) 


Ө ДИЛЕ 
shill it 
ою ШЕ 
= lie 
(SIE 1 
| ES lal IEE 


and so on up to 


Я (2) 
D 


I- 15] ыле у 
АКА SEM 


А 
А D — 
The seven categories which follow are a horizontal mirror image of the first seven. 
When any given ranking of the three B's and C's is placed in the blank positions it is 
found that s increases by 1 for each successive category. We can then build up the distribu- 
tion of s in the 56 x 720 — 40,320 rankings of 8 when we have the distribution for one case, 
e.g. the array with A first and D last. It is not easy to ascertain this distribution. There are 
720 members and each has to be compared with one of 48 rankings. By writing down the 
720 permutations and counting I find the distribution given in Table 1. The distribution 
for the 40,320 rankings is then obtained as in Table 2. 


Table 1. Distribution of s for the 6 B's and C's 


Frequency of s 


с © ي که‎ л دن م‎ к н © 


Some further distributions for the case of two variates are given in Table 3 for the 2х2, 
2х3, 2x 4and 3 x 3 cases. 

With four dichotomous variables there are 16! rankings to be considered. (1111) and 
(2222) occur in 240 ways in a ranking of 16 and, as before, may be treated independently of 
the remaining 14 combinations in any enumeration for the distribution of s. Since direct 
enumeration of the distribution of s for the remaining 14 combinations would have been 
too tedious, an estimate of the form of the distribution was obtained by sampling. 4 

The 14 combinations were written on slips which were placed ina bowl from which drawings 


were made to obtain a sample of some 2500 rankings. This number was held to be sub- - 


stantial in view of the 240 rankings of 16 associated with each of the 2500. The 240 x 2500, 


420 Distributions of Kendall's tau based on partially ordered systems | 


SoL HABERMAN 421 


values of s were weighted according to the theoretical frequencies of certain sub-distribu- 
tions to which they belonged. When graphed as a frequency polygon, the final estimate of 
the form of the distribution appeared almost identical with a normal curve. Tam unable 
at this stage to give a more comprehensive account of the distributions of а and partially 
ordered systems. 


Table 2. Computations used to obtain the distribution of в for the rankings of the eight 
combinations of three dichotomous variables 


ЖИТТЕ Е 


EE 
ЕЕЕ Е: 


18 | 


w 
= 
e 


EEN ЫЕ) 


96 
108 
120 

96 


24 


48 
36 


| 11111111888 


[ТЯ ЕКЕТ) 


12 
12 
60 


| 


e| nmi 


ا 
Ё‏ 
t‏ 


3444/3828 400839723720 327627242112 


oo 
КЧ 


` Grand total, 40,320 


MIssING COMBINATIONS 


In the preceding, distributions of s are given for the situation where all of the possible 
combinations of the ranks of a set of variables are considered. It happens in practice, how- 


ever, that some combinations do not appear at all or else appear with negligible frequencies. 
the others. Since the 


In such cases those combinations cannot be ranked with respect to hers. Sin 
problem discussed in this paper is to test if an observed ranking of combinations is con- 
sistent with a partially ordered system of the same combinations, we have to derive dis- 
ered systems in which those combinations do not appear. 
This was done in connexion with the derivation of the distribution of s for the case of three 
dichotomous variables. Table 1 gives the distribution which is based on the partially 
ordered system in which both (111) апа (222) are missing. = 

When о одай are considered a few at a time, the distributions so formed often fall 


into classes where within each class all are identical. For instance, when, in the case of 


three dichotomous variables, the combinations are considered 7 at a time we get either of 


two forms of distribution of 8: e 
(a) Form F, if one of the B's or C's is missing; and 
(b) Form G, if either A or D is missing. 
The two forms of distribution are given n Table 4. 


WESSZBYXTESTTR: 


E 


422 Distributions of Kendall's tau based on partially ordered systems 


Table 3. Two variable cases 


A, both dichotomous. B, one dichotomous and the other having three ranks, 
C, one dichotomous and the other having four ranks. Р, both having three ranks, 


Frequencies 


с ي که س‎ сл + о н о 


362,880 


Sor HABERMAN 423 


Table 4. Distribution of s for three dichotomous variables with one combination missing 


[U 
H 
2 
3 
4 
5 
6 
7 
8 
9 


BESEZEIESE 


- 


o8 SS 
| iyez ВЕ ВЕРЕ ЕЕ Р 


Biom. 42 
28 


424 Distributions of Kendall’s tau based on partially ordered systems 


Table 5. Distribution of s for the case of two variables, one dichotomous 
and one fourfold, with one combination missing 


Frequencies (two distributions in each form) 


0 
1 
2 
3 
4 
5 
6 
7 
8 
9 


Again, when in the case of two variables where one is dichotomous and the other has 
four ranked classifications, the combinations are considered 7 at a time and we get four forms 
of distribution of s: 

(a) Form P, if either (11) or (24) is missing; 

(6) Form Q, if either (21) or (14) is missing; 

(c) Form R, if either (12) or (23) is missing; and 

(d) Form S, if either (13) or (22) is missing. 

The four forms of distribution are given in Table 5. 


The writer is indebted to Prof. F. Stuart Chapin for encouragement and to Prof. 
Palmer O. Johnson for guidance. 


REFERENCE 
KENDALL, M. G. (1955). Rank Correlation Methods, 2nd ed. London. Chas. Griffin and Со. | 


| 


[ 425 ] 


ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS 


By HERBERT A. SIMON} 
Carnegie Institute of Technology 


І. IwxTRODUCTION 


It is the purpose of this paper to analyse а class of distribution functions that appears in 
a wide range of empirical data— particularly data describing sociological, biological and 
economie phenomena. Itsappearance is so frequent, and the phenomena in which it appears 
so diverse, that one is led to the conjecture that if these phenomena have any property in 
commonit canonly bea similarity in the structure ofthe underlying probability mechanisms. 
The empirical distributions to which we shall refer specifically are: (A) distributions of 
words in prose samples by their frequeney of occurrence, (B) distributions of scientists by 
number of papers published, (C) distributions of cities by population, (D) distributions of 
incomes by size, and (E) distributions of biological genera by number of species. 

No one supposes that there is any connexion between horse-kicks suffered by soldiers 
in the German army and blood cells on a microscope slide other than that the same urn 
scheme provides a satisfactory abstract model of both phenomena. It is in the same direc- 
tion that we shall look for an explanation of the observed close similarities among the five 
classes of distributions listed above. 

The observed distributions have the following characteristics in common: 

(а) They are J-shaped, or at least highly skewed, with very long upper tails. The tails 
can generally be approximated closely by a function of the form 

f) = (а), (1-1) 


where a, b, and Ё are constants; and where b is so close to unity that in first approximation 
the final factor has a significant effect on. f(i) only for very large values of i. Thus, for example, 
the number of words that occur exactly i times in James Јоусе?в Ulysses is about a[i*; the 
number of authors who published exactly i papers in Econometrica over a twenty-year 
period is approximately a[i*; and so on. 

(b) The exponent, Ё, is greater than 1, and in the cases of 

ions i close to 2. 

ха оаа M ОВИХ and biological genera, the function (1-1) 


c) In the cases of word frequencies, pub | 
Nor Jy in the tail but also for small values of i. In these cases 


describes the distribution not mere 3 
the ratio /(2)//(1) is generally in the neighbourhood of one-third, and almost never reaches 
is generally in the neighbourhood of one-half. 


e 
one-half; while f(1)/n, where ® = X f). 

Property (a) is characteristic of the ‘con ious’ distributions—for example, the negative 
binomial as it approaches its limiting form, Fisher’s logarithmic series distribution. How- 
ever, in the case of the negative binomial, Ё cannot exceed unity (and equals unity only in 

benefi: ful comments from Messrs Benoit Mandelbrot, Robert Solow and 
C. b Ire Bert posterior Ford Foundation for a grant-in-aid that made the completion of 


this work possible. ] i К 1 
j See Zipf (1949) for numerous examples of distributions with this property. м 


word frequencies, publication, 


426 On a class of skew distribution functions 


the limiting case of the log series); and if the distribution has a long tail, so that the con- 
vergence factor, b, is close to unity, f(2)/f(1) cannot be less than one-half. Hence the negative 
binomial and Fisher’s log series distributions do not provide a satisfactory fit for data 
possessing property (a) together with either (b) or (с). 

It is well known that the negative binomial and the log series distributions can be obtained 
as the stationary solutions of certain stochastic processes. For example, J. Н. Darwin 
(1953) derives these from birth and death processes, with appropriate assumptions as to 
the birth- and death-rates and the initial conditions. In this paper we shall show that 
stochastic processes closely similar to those yielding the negative binomial or log series 
distributions lead to a class of functions having the three properties enumerated above. 
This class of functions is given by 


JG) = AB(, p +1), (1:2) 
where A and p are constants, and B(i, p + 1) is the Beta function of i, p +1: 
| ке TG) Г(р+1) : 
= s-1/] — = = Е б 1: 
B(i,p4- 1) fia (1—A)^ dA TG- o1) (0<i; 0 «p <o) (1:3) 
Now it is a well-known property of the Gamma function (Titehmarsh, 1939, p. 58) that 
as і — oo, and for any constant, k, > 
TO nik (14) 
P(? +k) 1 
Hence, from (1:3), we have, as i->00: 
fi)~T(p +1) i-o», (1:5) 


` Therefore, the distribution (1-2) approximates the distribution (1-1) in the tail (more 
precisely, through the range in which the convergence factor of the latter is close to one). 
Further, if p is positive, / will be greater than 1, as required by (b); and if p is equal to 1, 
k will be equal to 2. It is easy to see that in the latter case we will have 


1 1 MSS + 
Si) = “ЗЕБ, E/O =1, (1-6) 


so that f(2)/f(1) = 3; and f(1)/n = 4, as required by (с). 

In the remainder of this paper I propose: (a) to describe a stochastic process that leads 
to the stationary distribution (1-2); (b) to discuss some generalizations of this process; 
and (c) to construct hypotheses as to why the empirical phenomena mentioned above can 
be represented, approximately, by processes of this general kind. Before proceeding, I 
should like to mention two earlier derivations, one of (1:2), the other of (1:1), that [have been 
able to discover in the literature. 

Some thirty years ago, G. Udny Yule (1924) constructed a probability model, with (1:2) 
as its limiting distribution, to explain the distribution of biological genera by numbers of 
species. He also derived a modified form of (1-2), replacing the complete Beta-function of 
(1:3) by the incomplete Beta-function with upper limit of integration a < 1. (This modi- 
fication has the same effect as the introduction of the convergence factor, b‘, in (1-1)—1t 
causes a more rapid decrease in f(i) for very large values of i; cf. also Darwin (1953, p. 378).) 
It seems highly appropriate to call the distribution (1-2) the Yule distribution. 

t The contrasting characteristics of distributions for which the log series provides a satisfactory 


fit and those, under consideration here, for which it does not are illustrated by examples (i) and (ti), 
respectively, in Good (1953). 


HERBERT A. SIMON 427 


Because Yule’s paper predated the modern theory of stochastic processes, his derivation 
was necessarily more involved than the one we shall employ here. Moreover, while the 
assumptions he required are plausible for the particular biological problem he treated, the 
corresponding assumptions applied to the four other phenomena we have mentioned appear 
much less plausible. Our derivation requires substantially weaker assumptions than Yule’s 
about the underlying probability mechanism. 

More recently D. б. Champernowne (1953) has contructed a stochastic model of income 
distribution that leads to (1-1) and to generalizations of that function. Since the points of 
similarity between his model and the one under discussion here are not entirely obvious at 
a first examination, I shall consider their relation in a later section of this paper. 


II. THE STOCHASTIC MODEL 


For ease of exposition, the model will be described in terms of word frequencies. In a later 
section, alternative interpretations will be provided. Our present interest is in the kind of 
stochastic process that would lead to (1:2). 

Consider a book that is being written, and that has reached a length of k words. We 
designate by f(i, k) the number of different words that have occurred exactly i times in the 
first k words. That is, if there are 407 different words that have occurred exactly once each, 
then f(1, k) = 407. 

Assumption I. The probability that the (k+1)-st word is a word that has already appeared 
exactly i times is proportional to if (i, k)—that is, to the total number of occurrences of all 
the words that have appeared exactly û times. 

Note that this assumption is much weaker than the assumption (I'): that the probability 
a particular word occur next be proportional to the number of its previous occurrences. 
Assumption (I^) implies (I), but the converse is not true. Hence we leave open the possi- 
bility that, among all words that have appeared i times the probability of recurrence of 
some may be much higher than of others. 

Assumption II. There is а constant probability, æ, that the 
a word that has not occurred in the first Ё words. ; " 

Assumptions (I) and (II) describe a stochastic process, in which the probability that a 
particular word will be the next one written depends on what words have been written 
previously. If this process correctly describes the selection of words, then the words in a 


book cannot be regarded as а random sample drawn from a population with a prior dis- 
tribution. The reasonableness of the former, as compared with the latter type of explanation 


of the observed distributions, will be discussed in § IV. 
From (I), it follows that 
eya b+) MH) = KG) (6- D-10768). 675 kel, — (P) 


for if the (k+ 1)st word is one that has previously occurred (t -1) times, fli, ы) ЖЩ с 
increased over f(i, k), and the probability of this, by assumption (D), is rop e i 
(i— 1)f(i — 1, k); if the (k+ 1)st word is one that previously occurred i times, A » 
be decreased, and the probability of this, by assumption (I), is proportion i, k); 
while in all other cases, /(%, k + 1) = f(i,k). 

From (I) and (П) we obtain similarly 


@{/0,6+1)-70.0 = a— K(k) f( 


(k+ 1)-st word be а new word— 


1,k) (0<ж<1). (2-2) 


Since we will be concerned throughout with ‘steady-state’ distributions (as defined by 
equation (2-8) below), we replace the expected values in (2-1) and (2:2) by the actual fre- 
quencies. (Alternatively, we might replace frequencies on the right-hand side of the 
equation by probabilities.) That is, we write, instead of (2-1) and (2-2), 


fü, k-1)-fü,k) = К(®){@—1)/@—1,&)—/(@%,Е)} (—2,.,k-1, (2%) 
fü, k-1)- fü, k) = &— K(k) f(1, k), (24). 


where the f's now represent expected values. 
Now, we wish to evaluate the factor of proportionality K(k). Since K(k)if(i, k) is the 
probability that the (k+ 1)st word is one that previously occurred i times, we must have 


428 On a class of skew distribution functions | 
| 


EK ifti, k) = K(k) el if, b) = 1—a. (25) 

But 2 if(i, k) is the total number of words up to the kth, hence 
Df) =k, (26) 
and ко- 1“, (2-7) 


Substituting (2-7) in (2-3) and (2-4), we could solve these differential equations explicitly. 
We can avail ourselves, however, of a simpler—though non-rigorous—method for dis- 
covering the solutions, and can then test their correctness by substitution in the original 
equations. Consider the ‘steady-state’ distribution in the following sense. We assume 

ve = ae for all i and k; (2-8) 
so that all the frequencies grow proportionately with k, and hence maintain the same 
relative size. (Since we must have f(i,k) = 0 for i>k, equation (2-8) cannot hold exactly 
for all i and Е. But as explained above, we are concerned at the moment with heuristic 
rather than proof.) 

From (2:8) it follows that 


fk) _ f(kr1) 
fG=1,5 fü-1Er1) 7 ^9» (2:9) 


where /(i) does not involve k. Hence, the relative frequencies, which we will designate by 
f*(i), are independent of k. Substituting (2-7), (2-8) and (2-9) in (2-3), we get 


k > — j— M 
(EF-en im Catt (i, k). ш 


Cancelling the common factor, and solving £ for (i), we obtain 


Ali) = iem - rin (i = 2, ..., k). (211) 


For convenience, we introduce 


Е 1 
T 2-12) 
as =7 (1<р<оо). ( 


HERBERT A. SIMON 429 

Since f*(i) = A)/*(i— 1) = A(i).A(i—1)...A(2)/*(1), we obtain from (2-11) and (2-12) 
e = = (i—1)(6—2)...2.1 _ Fé T(p 1) z ^ 

P раро Л Гр) ВР 070 бта 

2-13 

The second relation follows from the fact that کی‎ 

Г(і+р+1) = (+p) PG +p) = (659) 4p — 1). (1+ р) Г(р+1). (214) 


But (2:13) is identical with (1-2) if we take A = f*(1). 

That (2-13) is in fact a solution of (2-3) can be verified by direct substitution. Moreover, 
it is in the following sense a stable solution. Suppose that (2-11) is not satisfied. Whatever 
be the values of the f(i, k) for a given k, we may write without loss of generality 


fk) . (1-2) (i— 1) 3 
fü-1,E^ (1-а)#+1+6(%Ю)' e 
where c(i, k) is some function of i and k. If we now divide both sides of (2:3) by f(i, k) and 
substitute (2:15) in the right-hand side of the resulting equation, we obtain after simpli- 


fication 


fg, k+!) k+1+eli, k) (2-16) 
f, k) k : 


Hence the ratio of fii + 1) to Дф, E) will be greater than (k + 1)/k if e(i, k) is positive, and 
less than (k+ 1)/k if e(i, k) is negative. Since new words are introduced at a constant rate, 


x (i, k) must be proportional to k; therefore, by (2:16), we will have 
1 
к klit Y = 
i EIUS. az „Юу, k) = 0. 217 
Улов р A65 pce GH) (217) 


We may interpret the three equations, (2-15)-(2-17), as follows. In an average sense, tho 
frequencies will grow proportionately with k. Ifa particular frequency is too large 

compared with the next lower frequency (eli, k) negative in (215)), it will grow at a rate 
slower than the average; if it is “too small’ (e(i, k) positive), it will grow more rapidly than 


the average. н "es. 
It remains to be shown that f*ü) = Bip 1)f*(1) is a proper distribution function. 
k +: 
In particular, we require that Y, iB(i, p +1) converge as k->00. Now, it is well known that 
1 


© i= 
У, i-a converges for every 4 > 1. But by (1-4), 
i=1 
Sigi p уеб = i. (218) 
ici t 
Hence, by the usual ratio comparison test, Y, +B, р+ 1) converges for p» 1, as required. 
From the definition of æ the total number, ть, of different words will be ak; while the total 


number of word occurrences is k. That is 


; k * H " 
ny, = 3 fi.’ = ak = «Xt (i, k). T (2-19) 


490 ` On a class of skew distribution functions 
Returning to (2-4), and using (28), we get 


(Fra = ,وګج‎ (2-20) 
whence /*@) = i = 5. (2-21) 


From (2-12) and (2-21), and by successive application of (2-11), we can compute the 
values of p, f*(1)/n,, f*(2)/n,, f *(3)/n,, etc., for given values of o (Table 1). 


Table 1 


f*ü)/n, f*(2)/n, J*(3)/n, 


0-500 0-167 0-083 
0-527 0-169 0-082 
0-556 0-171 0-080 
0-588 0-171 0-077 
0-667 0-167 0-067 
0-769 0-144 0-046 
0-909 0-076 0-012 


Thus far we have considered the case where a, the rate at which new words are intro- 
duced, is independent of Ё. We can easily generalize to the case where c is a function of k 
by making the appropriate substitution in (2-4). The equations can then be solved directly, 
but the method employed to obtain a ‘steady-state’ distribution is not applicable, since it is 
not easy to define what is meant by the steady state in this more general case. We will 
content ourselves with some approximate results for two special cases. These special cases 
will give us insight as to how a distribution function may arise which, for small values of û, 
can be approximated by (1-2), with 0<p<1. 

Case I. Suppose the system to be in the steady state described by (2-13) with k = ky, 
and that the flow of new words suddenly ceases, so that a(k) = 0 for k > ky. We will now have 
K(k) = 1/k for k > ky, and (2-4) becomes 


f(l,k+1) = (i-i. - Fin. (222) 
We define y(i) De (i = 2,..., k--1). (2:23) 


Since no new words are being introduced, we must have 
k к 

т, =f E) X flik) =f(1,k+ 1)+ X fü, ks 1) 
а =2 


= EZPAL a+ ў убу, (224) 
t=2 


k 
EG) 1]/(@, k) 
whence Se Se (2:26) 


B s kk : 
У f(,k) X fli, k) 
ime im? 


HERBERT A. SIMON 431 
Let us define next fli, k) (2—1) 


PO = 6-1,7 (+p 2 
(where we suppose that p, changes only slowly with k). Instead of (2-3), we have 
fli,k+1)-fli, k) = zli- 1)f(i-1,k)—if(i, 0). (2-27) 
Substituting (2-23) and (2-26) in this, we get 
yli)—1 = +p) 0, (228) 
whence p, = kr) - 1), (2-29) 
k . 1 
CARDO- DA quu ЛВ 
апа р = k k = m-f E) (2 30) 
Y fli, k) Shi” f 
i=2 i22 
Define py = f (U, т. (2:31) 
Then p=- and 0<р<оо. (2-32) 


1—2, 

Proceeding heuristically, we can see that after æ becomes zero, f(1, k) will begin to decrease 
with k, and the value of р; will be larger the larger is i. For small values of i, we will have 
p(i) <p, and for large values, p(i)>p. However, the tail of the distribution will be affected 
only slowly by the change ina. Hence, we may suppose that Иш) = ру, where p, is (Ёо). 


On the other hand, since the weighted average in (2:29) is heavily influenced by the large 
frequencies for small values of i, p; will be only slightly less than p. Hence we may expect 
the distribution to take the form of a slightly curved line on a double-log scale, with a slope _ 
of — (p+ 1) at the lower end, and a slope of — (00+ 1)at the upper end. If p, > 2, then 
Xif(i, k) will converge. An example of such a distribution will be given m giv. 

Case 11. A second approximate solution can be obtained if we assume that a decreases 
with k, but very slowly. By definition, we have a = dn,/dk = т. The condition for a steady 
state (all frequencies increasing proportionately) is now 


fü, Е+1) = пт B) (2:33) 

Substituting as before, (27) and (2:33) in (2:3), we again obtain (2:13), where p is now 

given by TM on 
P=, (1-7) 


The slope obtained in the derivation for constant æ has now been multiplied by the factor 


(n'k)|n,, which for monotonically decreasing a is less than one. Hence, the effect of a 


decrease in the rate of introduction of new words is to lengthen the tail of the distribution, 


i i han one, we do not have a proper 
as was also true in case I. If the new value of p is less than : 

distribution function (see equation (2-18)), hence the equation can hold only for кыгы 
moderate values of i, and there must be a curve (on a logarithmic scale) in the tail of the 


distribution. 


432 On a class of skew distribution functions 


ПІ. AN ALTERNATIVE FORMULATION OF THE PROCESS 


There are some alternative ways for deriving the relation (2-13). One of these will be useful 
to us when we come, in the next section, to a more specific discussion of word frequencies 
and frequencies of publications. Moreover, this derivation avoids the difficulties we have 
encountered in the definition of ‘steady state’. 

Equation (2-10) may be written 


0=(1—)[@—1)/*6%—1)—3/*(4)]—/*@) (i= 2,...,8), (3-1) 


where we have again written f*(i) for fli, Е). 

Similarly, from (2-4), we obtain 

0 = 1-(1—a) f*(1)—f*(1). (3:2) 

These two equations may be interpreted as follows. We consider a sequence of k words, 
We add words to the sequence in accordance with assumptions (I) and (II) of § II, but we 
drop words from the sequence at the same average rate, во that the length of the sequence 
remains k. The method according to which we drop words is the following: 

Assumption III. Tf one representative of a particular word is dropped, then all repre- 
sentatives of that word are dropped, and the probability that the next word dropped be 


S(i,m+1)—f(i,m) = (1—)[@—1)/%— 1,m) —if(i,m)] —f(i, m), (33) 
where m is now not the total number of words (which remains a constant, Ё), but the number 
of additions to (and withdrawals from) an initial arbitrary sequence of k words. Since the 
k of this process, unlike that of SIL, remains constant, the ordinary proofs of the existence 
of a unique steady-state solution will apply (see Feller, 1950, p. 373), and we avoid the 
troublesome questions of rigour that confronted us in SII. 

The solution of (3-1) and (3-2) is, of course, again given by 


Г) _ (1-2) (i-1) 
f*i—-1) 1+(1-a)i ` 


If we were {о replace the last terms of (3-1) and of (3-2), respectively, by terms corre- 


sponding to the usual form of the death process, we would have (cf. Darwin, 1953, p. 375; 
and Kendall, 1948) 


0 = (1-а)[@—1)/%4—1)—/*(4]—[у*(%)— G+ YF + 1)] (i = 2,...,k—1), (3:4) 


9 = 1-(1-2)f*(1) - [f*(1)— 2/*(э)], (85) 
The solution of this system of equations is 


(211) 


f"ü _ (l-a) @-1) " 
ЖЫ (e a 


which is Fisher’s logarithmic series distribution. 


HERBERT A. SIMON 433 


Since the log series distribution is a limiting case of the negative binomial, we may ask 
whether there is a distribution that stands in the same relation to the latter as (2-11) stands 
in relation to (3-6). We can obtain such a distribution by a modification of the birth process 
in (3:1). We assume now that the birth-rate is the sum of two components—one propor- 
tional to if(i), the other proportional to f(i). In place of (3:1) we have 


(1—a)k 


0z 
k+e 


[@—1+е)/*@—1)— (Ф+с)/*(4)]—//%(Ф) (с à constant), (3-7) 


f") | Ai-1+0)_ (@—1+с) 
f*ü-) но) (е 1А)’ 


where A=k(1—a)/(k+e). 


the solution of which is (3-8) 


А rather remarkable property of (3-8) is that in the tail it still has the limiting form (1-1) 
with b = 1. Hence for a and c small, this generalized Yule distribution will still possess the 
three properties listed in the introduction. The fact that a reasonably wide range of variation 
in the assumptions underlying the stochastic model does not alter greatly the form of the 
distribution adds plausibility to the use of such stochastic processes to explain the observed 
distributions. Our next task is to consider these explanations in more detail. 


IV. THE EMPIRICAL DISTRIBUTIONS 


In this section I shall try to give theoretical justifications for the observed fit of the Yule 
distribution to a number of different sets of empirical data. 


A. Word frequencies 

А substantial number of word counts have been made, in English and in other languages 
(see Hanley, 1937; Thorndike, 1937; Yule, 1944; Zipf, 1949; and Good, 1953). Equation 
(1-6) provides a good fit to almost all of them. When the more general function, (1-2), is 
used, the estimated value of p is always close to 1. When a convergence factor, bf, is intro- 
duced to account for the deficiency in frequencies for very large values of i, the estimated 
value of b is also very close to 1. Good (1953), for instance, applies (1-6) multiplied by a 
convergence factor to the Eldridge count, and obtains b = 0-999667. à 

These regularities are the more surprising in that the various counts refer to a quite 


heterogeneous set of objects. In the Yule and Thorndike counts, inflected forms are 


counted with the root word; in most of the other counts each form is regarded as a distinet 
word. The Yule counts include only nouns; the others, all parts of speech. The Dewey, 


Eldridge and Thorndike counts are composite—compiled from a large number of separate 


writings; most of the others are based on a single piece of continuous prose. I would regard 
ce that the explanation is to be sought in a probability 


this heterogeneity as further eviden | 
п es than in more specific properties of language; but at the same time, the 
heterogeneity complicates the task of specifying the probability mechanism in detail. 
I shall avoid questions of ‘fine structure » —which would require an expertness in linguisties 
that I do not possess—and confine myself to three broad problems: (1) the distribution ur 
word frequencies in the whole historical sequence of words that constitutes a language; 
(2) the distribution of word frequencies in а continuous piece of prose; (3) the distribution 


of word frequencies in à sample of prose assembled from composite sources. 


434 On a class of skew distribution functions 


(1) For obvious reasons, we do not have any empirical data on the cumulated word 
frequencies for a whole language. On а priori grounds, it does not appear unreasonable to 
postulate that these frequencies are determined by a process like that described in § II. The 
parameter a is then the rate at which neologisms appear in the language as a fraction of all 
word occurrences—and hence « can be assumed to be very close to zero. 

(2) The process of § II might also describe the growth of a continuous piece of prose— 
for example, Joyce's Ulysses. But there are some serious objections to this hypothesis. An 
author writes not only by processes of association —i.e. sampling earlier segments of the 
word sequence—but also by processes of imitation —i.e. sampling segments of word sequences 
from other works he has written, from works of other authors, and, of course, from sequences 
he has heard. Themodelof$ITapparently allows only for association, and excludes imitation. 

The word frequencies in Ulysses provide obvious evidence of the importance of both 
processes. The fact that the proper noun ‘Bloom’ occurs 926 times and ranks 30th in 
frequency must be attributed to association. If Joyce had named his hero ‘Smith’, that 
noun, instead of ‘Bloom’, would have ranked 30th. On the other hand, ‘they’, which 
occurs 1010 times in Ulysses and ranks 27th, has very nearly the same rank—the 28th— 
in the Dewey count. In fact, of thé 100 most frequent words in Ulysses, 78 are among the 
top 100 in the Dewey count. This similarity in ranking of ‘common’ words argues for 
imitation rather than association. Even for the common words, however, the variations 
in frequency from one count to another are far too great to be explained as fluctuations 
resulting from random sampling from a common population of words. The imitative process 
must involve stratified sampling, and imitation must be compounded with association. 

Tt is worth emphasizing again at this point that assumption (I) does not require that the 
choice of thenext word from among those previously written be completely random. Suppose, 
for example, that a writer were to assign to each page he has already written a number, 
рр Xp; = (1—a), the size of р; varying with the ‘affinity’ of the subject discussed on the 
jth page to the subject next to be discussed. If his next word were selected by a stratified 
sampling of the previous pages, with probability р, for each page, then equation (2-1) 
would generally be satisfied. For although individual words would be distributed unevenly 
through the preceding pages, the totality of words having a given frequency, i, in all the 
previous pages taken together would be distributed almost evenly through these pages. 
Hence, the various frequency strata would have proportionate probabilities of being 
sampled, for most choices of the p;. This is all that is required for equation (2-1). This same 
comment applies to the assumption we shall subsequently make regarding imitative 
sampling from other works. 

Let us now reconsider the problem of a piece of continuous prose. Since both the processes 
of association and imitation are involved, the sequence that is counted is to be regarded as 
a ‘slice’, of length Ё, of the entire sequence of words in the language, or of the entire sequence 
written by the author. Hence the word count is better described by the stochastic process 
of § ПІ than by the process of § II. 

In determining the probability that a word selected in such a sequence be one that has 
occurred exactly i times, we must consider separately the process of imitation and associa- 
tion. Assume that, on the average, a fraction, 2, of the words added is selected by imitation, 
and the remaining fraction, (1—/), by association. Since no new words can be introduced 
by association, the joint probability that the next word will be selected by association and 
will be a word that has already occurred i times is (1 —P)if(i,k)/k. 


HERBERT A. SIMON 435 


The words selected by imitation present a more difficult problem, and we shall have to 
content ourselves with a reasonable assumption that has no rigorous justification. On the 
average, a word that has occurred i times will have a chance less than i/k of being the next 
one chosen by imitation, because in the sequence that is being sampled there are words 
that have not yet been chosen at all, and because with progressive change of subject, dif- 
ferent strata of the language will be sampled. Since words with large i will generally be 
‘common’ words, fairly uniformly distributed through all strata of the language, the 
deficiency may be expected to be proportionately greater for small i than for large i. As 
a rough, but reasonable, approximation let us assume that: the joint probability that the 
next word will be selected by imitation and will be a word that has already occurred i times 
is (i—c)f(i, k)/k, where 0«c« 1. (Our result would not be essentially altered if we wrote 
c(i) instead of c, provided only that c(i) does not vary a great deal.) 

Adding the two joint probabilities—for association and imitation, respectively—we 
find that the total probability that the next word be one that has occurred i times is 
(i— fc) fi, k)]k. By summing this probability over i and subtracting from 1, we find that 
the probability that the next word be a new word із fe(n,/k). 

If the method of dropping words from the sequence satisfies assumption (III) of § ILI, 
we set the difference between the birth-rate and the death-rate equal to zero, and obtain 
the steady-state equation 


0 = (i- cf — 1)f*(i— 1) - (i—e5)/*6) -f*(). (41) 
which has as its solution Fu - Cy 4?) 


Again, we obtain a distribution with the required properties. 

(3) The distribution of word frequencies in a sample of prose assembled from composite 
sources can be explained along the same general lines. Again, we may regard the sample as 
a ‘slice’ from a longer sequence, but we might expect the parameters ¢ and f to be somewhat 
larger than in a comparable piece of continuous prose. The qualification ‘comparable is 
important, for c may be expected to be smaller for homogeneous prose using a limited 
vocabulary of common words than for prose with a large vocabulary and treating of a 
variety of subjects. Hence c might well be larger for the continuous Ulysses count than for 
the Eldridge count, which is drawn from newspaper sources. Indeed, the empirical evidence 
suggests that this is the case. ] 

There is no point in elaborating the explanation further. What has been shown is that the 
observed frequencies can be fitted by distributions derived from probability assumptions 


that are not without plausibility. 


A very different and very ingenious explanation of the observed word-frequency data has 


been advanced recently by Dr Benoit Mandelbrot (1953). His derivation rests on m 
assumption that the frequencies are determined во as to maximize the number an bits о 
information, in the sense of Shannon, transmitted per symbol. There are several reasons 


why I prefer an explanation that employs averaging алво gee tid Wem 
First, an assumption that word usage satisfies some eriterion of efficiency ата 

much stronger than the probability assumptions required apum : зет ал 
doubts, which I share, have been expressed as to the relevance of Shannon s into 


measure for the measurement of semantic information. 


436 On а class of skew distribution functions 


Before leaving the subject of word frequencies, it may be of interest to look at во 
the empirical data. Good (1953, рр. 257-60), has obtained good fits to the Eldridge a 
and to one of Yule's counts by the use of equation (1-6). Table 2, summarizes a few 
data on two word counts, and compares the actual frequencies, f(1), f(2) and f(3) with 
frequencies estimated from equation (1-3). The actual values of k and n, аге used toe 
a = n,/k, and (2-11) and (2-21) to obtain the expected frequencies. In both cases the 
served value of n,/k leads to an estimate of p in the neighbourhood of 1-1 to 1-2. An emplt 
fit to the whole distribution of a function of the form f(i) = Ka~°* gives an estim 
value of p, in both cases, of about one—in reasonable agreement with the first estim 
A good fit to both the Ulysses and the Eldridge counts can also be obtained from (| 
with c equal to about 0-2 in the former case, and close to zero in the latter. 

In the case of Thorndike's count of 4} million words in children’s books (Thorndike, 198 
we may assume that the supply of new words was virtually exhausted before the end of th 
count. In his count f(1) is substantially below 0-5n, (about 0-34n,), as we would expect und 
these circumstances (see case I of $ II). Thorndike estimated the empirical value of ou rj 
0-45, which is entirely consistent with the observed value of 0:34, for f(1). For, by (2* 
Pı = p/(p- 1) = 031. 

Table 2 


Ulysses (Hanley, 1937) 
Eldridge (Good, 1953) 


B. Scientific publications 
At least four sets of data are available on the number, fi), of authors contributing a 
given number, i, of papers each to a journal or journals (Davis, 1941; Leavens, 1958). 
These are counts of (a) papers written by members of the Chicago Section of the American 
Mathematical Society over a 25-year period; (b) papers listed in Chemical Abstracts (u 
A and B) over 10 years; (c) papers referred to in a history of physics; and (d) papers and 
abstracts in Econometrica over a 20-year period. a 
We may postulate a mechanism like that of $111, equation (3-1). The authorship of th 
next paper to appear is ‘selected’ by stratified sampling from the strata of authors who have 
previously published 1, 2, ..., papers, the probability for each stratum being proportion 
to if(i). Again, the probabilities for individual authors need not be proportional to û, but 
only the probabilities for the aggregates of authors with the same i. For example (as in th 
case of words), the probability for a particular author may be higher if he has pub 
recently than if he has not. The gradual retirement of authors corresponds to ass 
tion (III). 
А comparison of the actual frequencies, for i from 1 to 10, with the estimated frequen 
derived from (2-11) and (2-21), is shown in Table 3. The fit is reasonably good, when 
remembered that only one parameter is available for adjustment. However, it should b 


> 


HERBERT A. ёмох 437 


` noted that the estimated frequencies tend to be too high fori = 1, 2and too low fori = 3,..., 
10. In three of the four cases, they are again too high for the tails of the distributions. 
A further refinement of the model is apparently needed to remove these discrepancies. 


Table 3. Number of persona contributing 
a e 
No, of 
contributions 


1 
ә 
3 
4 
5 
6 
7 


© woo 


1 


11 or more 


Estimated æ 
Estimated p 


ү Davis (1941). 
1 Leavens (1953). 
$ p— p estimated in this case from (2:31) to (2-32). 


C. City sizes 
i i and for 
It has been observed, for every U.S. Census since the early nineteenth century, : 
most other Western countries as well, that if F(i) is the number of cities of population 


greater than 7, then F(i)~ Ai^, (4:3) 


where p is close to 1 (see Zipf, 1949, chs. 9, 10). , : А 

iati we would expect such a distribution if the underlying mechanism were one 
describable by equations like (2-3) and (2-4). Such a mechanism is as rem антене 
First, equation (2:3) would hold if the growth of population те ue во у یا‎ 
excess of births over deaths, and if this net growth were proporti a we са 3 
This assumption is certainly satisfied at least roughly. Moreover, it not or eac 


city, but only for the aggregate : 
would still be satisfied if there were net migration to OF tion ancthé ; 
provided the net addition or loss of population - Miri "y xd TL. d all fers! 
proportional to city size. That is, even if all California cities were growing, 


England cities declining, the equation would hold provided the percentage growth or 


decline in each area were uncorrelated with city size. 


438 On a class of skew distribution functions 


In the case of cities, equation (4-3) could only be expected to hold down to some minimum 
city size—say, 5000 or 10,000. The constant « would then be interpreted as the fraction of 
the total population growth in cities above the minimum size that is accounted for by the 
new cities that reach that size. 


D. Income distribution 


Vilfredo Pareto is generally credited with the discovery that if personal incomes are 
ranked by size, the number of persons, F(t), whose incomes exceed i can be approximated 
closely, for the upper ranges of income, by equation (4-3) with р usually in the neighbourhood 
of 1-5 (Davis, 1941; Champernowne, 1953). Hence, the income distributions bear a family 
resemblance in their upper ranges to those we have already considered, although the 
parameter, p, is substantially larger than 1—its characteristic value in the case of word 
frequencies and city size distributions. 

A stochastic mechanism similar to those described in $ III would again produce steady- 
state distributions closely resembling the observed ones. We picture the stream of income 
as a sequence of dollars allocated probabilistically to the recipients. If the total annual 
income of all persons above some specified minimum income is k dollars, the segment of 
this sequence running from the mth to the (m + k)th dollar is the income for the year begin- 
ning at time m. We assume that the probability that the next dollar will be allotted to some 
person with an annual income of i dollars is proportional to (i--c)f(i), with c positive but 
small. This represents a modification of assumption (I) that decreases the proportion of the 
total stream going to persons of high income relative to the proportion going to persons with 
incomes close to the minimum. We assume that a fraction of the dollars is assigned to new 
persons—i.e. persons reaching the minimum income to which the assumptions apply 
(assumption (П)). We assume that there is considerable variance among persons within 
each income class in the probability of receiving additional income, so that the rate at which 
dollars are dropped from any income class as m increases satisfies assumption (ПТ). Then 
we obtain again equation (3-8), which now holds for i greater than the minimum income. 
For large i, this distribution has the required properties with 1/A = p. 

The same result has been reached by D. G. Champernowne (1953), following a somewhat 
different route. He divides income recipients at time t, into classes of equal proportionate 
width. That is, if 7,, is the minimum income considered, then the first class contains persons 
with incomes between i, and rim, the second class, persons with incomes between ri,, and 
7*i,,, and so on. Next he introduces transition probabilities руу, that a person who is in class 
g at time f will be in class 4 at time ty. He assumes that ру, is a function only of (y—A). 
Now, by his definition of the income classes, the average income of persons in class g will 


be about rU» times the average income of persons in class ^. Hence, the expected income 
at t of a person who was in class g at t, will be 


X Ponin = У Pou-n" "ig = ої, (xa constant), (44) 


where i, is the average income in class g. Prof, Champernowne assumes explicitly that a < 1. 
From this it is clear that his model satisfies our assumptions (I) (in its original form) and (П). 
Further, since he assumes a substantial variance in income expectations among persons in 
a given class, our assumption (III) is also approximately satisfied. Hence, in spite of the 


surface differences between his model and those developed here, the underlying structure 
is the same. 


= 


HERBERT А. SIMON 439 


E. Biological species 

We conclude this very incomplete list of phenomena exhibiting the Yule distribution by 
mentioning the example originally analysed by Yule himself (1924). It was discovered by 
Willis that the number, f(i), of genera of plants having i species each was distributed approxi- 
mately according to (4:3), with p< 1. Yule explained these data by a probability model in 
which the probability, s, of a specific mutation occurring in a particular genus during a 
short time interval was proportional to the number of species in the genus; while the 
probability, r, of a generic mutation during the same interval was proportional to the 
number of genera. Starting at t with a single genus of one species, he computed the dis- 
tribution f(i, t) for f, ta, ..., and found the limit аз too. This limiting distribution corre- 
sponds to (2:13) with p = r/s. Yule observed that for ғ < з (as required to fit the empirical 
data), this was not a proper distribution funetion, and obtained the approximate dis- 
tribution for t = T. His procedure was equivalent to replacing the complete Beta function 
in (213) by the incomplete Beta function, taking as the upper limit of integration an 
appropriate function of Т. 

If, in the process of $1I, we define k as the total number of different species and f (i, Б) as 
the number of genera with exactly i species, we see that our k is a monotonic increasing 
function of Yule’s t (specifically, k = e“). Making the appropriate transformation of vari- 
ables, we find that Yule’s assumption with respect to the rate of specific mutation corre- 
sponds to our assumption (I') (and hence is considerably stronger than the assumption we 
employed in $II). Making the same transformation of variables with respect to his assump- 
tion of a constant rate of generic mutation, we find that n, = e". We can then compute 
a(k) (which will now vary with k) by taking the derivative of n, with respect to k. We obtain 


a(k) = 67—91] 8. (4-5) 


Tf we substitute these values in equation (2:34) of case II, where we assumed slowly 
changing «, we find in the limit, as t— co, p = r|s, as required. Hence, we see that the 
process of $ П is essentially the same as the one treated by Yule. ud 

It is interesting and a little surprising that when Yule, some twenty years after this dis- 
covery, examined the statistics of vocabulary, he did not employ this model to account for 
the observed distributions of word frequencies. Indeed, in his fascinating book on The 
Statistical Study of Literary Vocabulary (1944) he nowhere refers to his earlier paper on 
biological distributions. 

V. CONCLUSION 


This paper discusses a number of related stochastic processes that lead to a class of highly 
skewed distributions (the Yule distribution) possessing characteristic properties that 
distinguish them from such well-known functions as the negative binomial and Fisher 8 
logarithmic series. In $1, the distinctive properties of the Yule distribution were described. 
In 88II and III several stochastic processes were examined from which this distribution can 
be derived. In $IV, a number of empirical distributions that can be approximated closely 
by the Үше distribution were discussed, and mechanisms postulated to explain why they 
are determined by this particular kind of stochastic process. In the "vs bti en 
derivations of $$ II and Ш were compared with models previously propo т le ( ) 
and Champernowne (1953) to account for the data on biological species and on incomes, 
respectively. 
29 


Biom. 42 


440 On a class of skew distribution functions 


The probability assumptions we need for the derivations are relatively weak, and of the 
same order of generality as those commonly employed in deriving other distribution 
functions—the normal, Poisson, geometric and negative binomial. Hence, the frequency 
with which the Yule distribution occurs in nature—particularly in social phenomena— 
should occasion no great surprise. This does not imply that all occurrences of this empirical 
distribution are to be explained by the process discussed here. To the extent that other 
mechanisms can be shown also to lead to the same distribution, its common occurrence is 
the less surprising. Conversely, the mere fact that particular data conform to the Yule 
distribution and can be given a plausible interpretation in terms of the stochastic model 
proposed here tells little about the underlying phenomena beyond what is contained in 
assumptions (I) through (III). 


REFERENCES 


CHAMPERNOWNE, D. G. (1953). A model of income distribution. Econ. J. 63, 318. 

Darwin, J. Н. (1953). Population differences between species growing according to simple birth and 
death processes. Biometrika, 40, 370. 

Davis, Hanorp T. (1941). The Analysis of Economic Time Series. Principia Press. 

FELLER, WILLIAM (1950). An Introduction to Probability Theory and Its Applications, vol. 1. Wiley. 

Соор, I. J. (1953). The population frequencies of species and the estimation of population parameters. 
Biometrika, 40, 237. 

HANLEY, Mires L. (1937). Word Index to James Joyce’s Ulysses. University of Wisconsin Press. 

KENDALL, Davin С. (1948). On some modes of population growth leading to К. A. Fisher's logarithmic 
series distribution. Biometrika, 35, 6. 

LEAVENS, Dickson Н. (1953). Econometrica, 21, 630. 

MANDELBROT, BENOIT (1953). An informational theory of the statistical structure of language. In 
Communication Theory (ed. by Willis Jackson). Butterworths. 

v due Epwarp L. (1937). On the number of words of any given frequency of use. Psychol. 

ес. 1, 399. 

TITCHMARSH, E. C. (1939). The Theory of Functions, 2nd ed. Oxford University Press. 

Yue, G. Upny (1924). A mathematical theory of evolution, based on the conclusions of Dr J. C. 
Willis, F.R.S. Phil. Trans. B, 213, 21. 

YULE, С. UDNY (1944). The Statistical Study of Literary Vocabulary. Cambridge University Press. 


Zier, GEORGE KINGSLEY (1949). Human Behavior and the Principle of Least Effort. Addison-Wesley 
Press, 


[ 441 ] 


SIMULTANEOUS TESTS OF LINEAR HYPOTHESES 


Bx M. N. GHOSH 
University of North Carolina and University of Calcutta 


1. INTRODUCTION 


A very common situation in the analysis of variance of survey data, where the investigator 
is not able to put his experimental data in the framework of a design, planned in advance, 
is that the estimates of the parameters are correlated in various ways and the analysis of 
variance becomes cumbersome. Even for the analysis of variance of a two-way classification 
with unequal class frequencies one meets with this difficulty. We shall be concerned here 
with the analysis of such data and the test of significance for groups of parameters repre- 
senting different aspects of the problem. We consider below a concrete example, in which 
standing height, sexual maturity characters and certain blood chemicals, like haemoglobin, 
ascorbic acid, carotene and alkaline phosphates were measured for a number of girls between 
the ages 9 and 14 years.* For the purpose of analysis the girls were divided into twenty-five 
classes according to sexual maturity ratings from indices of breast (B,) and pubic hair 
development (Ph,), each of these being a 5-point rating. 
The following model is assumed: 


Vip = + P, + Pata, + Ёз®з,к + У. Er (1-1) 


where (Yp 2, r a, 23, Zk) (Е = 1, 2, ..., N) are sample individuals. уш, represents the ` 
height increment of the kth girl in the maturity class (B;Ph,), z, x: to, д, Ta, ате nutritional 
variables haemoglobin, ascorbic acid and carotene respectively, and z, is alkaline phos- 
phatase. We thus consider three groups of parameters {д} (i,j = 1,...,›5), {i Ba» Ёз) апа 
{у}. The hypotheses corresponding to these three groups of parameters are 


Hs py =e (ij =1,2,...,5), 

H; ~, = 0, fs = 0, Ёз = 0, (1-2) 
H: у=0. 

would imply that the height increment depends upon sexual 
maturity, the rejection of H, would imply that the blood chemicals, which depend upon 
nutritional status, affect height increment, and the rejection of H, would imply that 


alkaline phosphatase affects height increment. A 

The hypotheses Hj, Н„, H, may be tested by the analysis of variance method, but these 
tests are not independent, because of the non-orthogonality of the groups of estimates as 
well as for the fact that we have to use the same estimate of error variance. We shall intro- 


duce the notion of quasi-independent tests of multiple hypotheses, which is appropriate in 


such situations. 


Rejection of the hypothesis H, 


* The data were collected by Dr Hughes Bryan, Professor of Nutrition, School of Public Health, 


University of North Carolina. 


29-2 


442 Simultaneous tests of linear hypotheses 


2. QUASI-INDEPENDENT TESTS OF MULTIPLE HYPOTHESES 
For any test of significance we consider the first and second kinds of error. Let the hypo- 
mache eset H0 m0 Бо mU. 0, = 0. (21) 
The tests Tj, Т, Т, of H,, H}, Н, will be called quasi-independent when 


Prob, {acceptance H, | 0, +0, 0, 03} = Prob; (acceptance H, | 0, + 0, 0, = 0, 0, = 0), 


2-2 
Prob; {rejection H,|0,=0,4,4,} = Prob; {rejection H,| 0,=0,0.=0,03;= a] en 


hold for all values of 0, and 0,, i.e. whether Н, and Н» are true or not; and similar equations 
hold for T, and 73. where Prob; ( } denotes the probability of the statement in parentheses 
for the test procedure 7j, etc. Thus for quasi-independent tests of hypotheses Н,, H, and Н, 
the first and second kinds of error for each hypothesis do not depend on the parameters of 
the other hypotheses. 

Two tests of hypotheses Н, and H, with the critical regions €, and C, are independent if 
the following holds for all 0, and 6,: 


Prob (X C €, ХСС, |0,0) = Prob (X C €, | 6), Prob (X CC, | 04}, 


X being the observed sample. 
It will be seen later that the usual analysis of variance tests for multiple hypotheses are 
quasi-independent, and we most often do not need independent tests. 


3. CONTROL OF ERRORS IN THE SIMULTANEOUS TESTS OF HYPOTHESES 


There may be different points of view for assigning significance levels, in the case of simul- 
taneous tests of hypotheses. In certain situations, where the decisions regarding the hypo- 
theses Æ, H,, ..., etc., are unrelated, it is proper to consider the significance level of each 
hypothesis individually at 5 or 1 % level (say). But when the decisions have a joint import, 
it is proper to consider the first kind of error of the simultaneous test of Н, and H,, as the 
rejection of at least one of the hypotheses, when all of them are in fact true. 

The significance level of the simultancous test is defined as the probability of rejecting at least 
one of the hypotheses, when all of them are true. 

Suppose in the test of a new variety of crop against a common variety used as a control 
we are interested in two different characters and the new variety may be considered desir- 
able when it is superior in either of these characters. Since one would be interested to know 
which particular character in the new variety is superior to the control, a simultaneous test 
should be done. In this case, the significance level, as some sort of measure of the amount 
of caution implied in the test, should naturally be that of the simultaneous test, since one 
would like to insure, at a certain level, against declaring the new variety superior, when it 
is actually no better than the control. 

A similar situation exists in quality control for acceptance of material, when a number 
of characters are examined and the material is rejected when it is not up to the mark with 
respect to any of these characters. It may be useful, in this case, to determine the particular 
deficiency in any of these characters, and so we must use a simultaneous test, and the 
significance level of the simultaneous test should be used to insure against too frequent 


rejections. Of course, we pay for this by widening confidence intervals for the parameters 
measuring these characters. 


M. N. бнозн 443 


The need for a safety device like the simultaneous significance level would be even more 
apparent for the agricultural or quality control example stated above, when the number of 
characters is large, in which case the chance of declaring a new variety different from the 
common variety may be much larger than 5%. However, the control of the first kind of 
error in a simultaneous test is achieved only at the expense of increasing the second kind 
of error for individual characters. In any particular problem, whether the point of view of 
individual or simultaneous significance level should be adopted depends, roughly, on how 
much a priori weight one attaches to the alternative hypotheses, either from theoretical 
expectations or from considerations of cost in replacing the old variety by the new variety. 
If this cost is small, e.g. if the varieties are grown only on an experimental scale, one would 
be relatively free to decide on either of the varieties as superior and the individual signi- 
ficance level would be the proper one. However, with varieties established in agricultural 
practice, any statistical decision in favour of a new variety would involve large expenditures 
and the statistician should take an attitude of caution. The simultaneous significance level 
takes account of this attitude and is thus appropriate in such cases. 

In the analysis of variance problem considered before, the conventional statistical 
procedure of (a) testing for the overall SS for fitting constants, (b) testing separately for the 
SS of H,, adjusting for H, at 5 % level, ete., would give quasi-independent tests at individual 
levels of significance 5%. But the problem of obtaining the simultaneous significance level 
in this case is mathematically a very intractable one, and we shall find an upper bound for 
the significance level. This would give a control of the first kind of error. The operating 
characteristic would, of course, be of the same nature as for the usual analysis of variance 
of multiple hypotheses, since the test procedure is essentially the same, except that the 
exact significance level is not known but only its upper bound. { 

The notion of simultaneous level of significance has already been considered in various 
ways and languages by Scheffé (1953), Tukey (1952) and Nandi (1951), and the practical 
implications have been thoroughly discussed by Tukey in a mimeographed report. 


4, SIMULTANEOUS TESTS AND SIGNIFICANCE LEVELS 
Consider the linear model 
Ely;) = Gi +--+ %imPm (i= haai) (41) 


y; being independent normal variables with unknown variance 0°, and Dy «Pn ате un- 
known parameters. Let rank (4,4) = N, and let, ... тр be estimable linear functions of the 
parameters Py, ..-» Pm 

T, = Тар +ара+ laps (K= 1,2, -o B), 


., lym) form a vector space of rank R < Ny. We consider 


(42) 


such that the coefficient vectors (L4, .- 
the following linear hypotheses, 


(43) 


КҮЧҮ КЫ Т ы ыша ра d 


H,: Mgt.. +Ks-1t+1 = 0, ..., Taser = 0. 
be the best linear estimates of these parameters, 


: 231. Ro MC p 
RA 2 3 We shall sometimes denote the coefficient vectors 


кү? 


obtained by the method of least squares. 


444 Simultaneous tests of linear hypotheses 


of these linear functions by the same symbol, so that we have the alternative notation for 
the linear function Y, = (¥;,y), where (Y,, y) is the scalar product of the vector Y, and the 
vector y. From Markoff’s theorem we have an independent estimate of error variance, 
S?, with n, d.f., which is independent of the parameters py, ..., Pm- Quasi-independent tests 
of the hypotheses H,, can be made by considering the linear functions 


(Eu, i y) n (V, y) (b, = Kit... +Kan), 


whose expectations are Ton rtt +e Mha Let U, 1+1 ---, Ub, be orthonormal vectors forming 
a basis of the vector space formed by Fon-ı+1 -++ Fop Then (U; y) is a linear form in 
(Fj, y) (i,j = 5, 4 1,...,5,) and 


bn On 
E(U.y)- X aE(,y)- X am from Е(Ү,у)=л. (44) 
j =bn-ı+1 j=bn-ı+1 


м 
On the hypothesis H,, д /02 = У [(U;,y)—E(U. p 9)] has a y*-distribution with x, d.f. 
j =bn-ı+1 


апа д2 /(к,92) has an F-distribution with d.f. (r,n,). The second kind of error for the test 
of H, depends only upon the parameters fy cries Tus 

We now consider sets M,, M, ... of vectors Y,, eo Ta Fato Ta, «+» belonging to 
hypotheses H,, ..., H,, so that all vectors belonging to the same hypothesis Н, belong to the 
same set M,. Vectors belonging to different sets are orthogonal, and if the vectors belonging 
to two hypotheses Н, and Н, are orthogonal they belong to different sets. These sets we shall 
call orthogonal sets and hypotheses belonging to two different sets, orthogonal hypotheses. 
We shall consider different cases according as an orthogonal set consists of a single hypo- 
thesis Н, or more. 

Case I. Let all hypotheses H,, ..., H, be orthogonal so that each orthogonal set M, consists 
of a single hypothesis. Let U, ..., Uas ests +++ Шз; +++ Шу... be orthonormal systems 
ы vectors in the spaces determined by Y,, эл; Yea so Ya agi s У,у, respeotively. 

et 
E((U,, y)) = Ф,. 


We shall consider for simplicity в = 3. The joint distribution of 


= (0-0), x - Со) Фу, Ме X (Wyo, (59 


and S? is given by 
1 
const exp |- 2^, d d xf + n, S7) Ое (pies ls (stes aytaytaygdSt 


Put x as № _ № _ 
neS Sv 2,8) ^ C» n, gO 


Making the transformation, integration for S? gives the distribution of Gis Go, бз as 


Clee: ng Denton 


(1+ в, + Ga + G, Mere) 


dG,4G,dG,,. (4-6) 


M. N. Сноѕн 445 


We now consider the region defined by G, < A,, @,<A,, G, < A, which has the probability 
P(A,, А». Аз), say. If P(4,,2,,A,) = 1—a, we have the system of simultaneous confidence 
regions C}, С, ... for the sets of parameters with the confidence coefficient 1 — х, as 


С: May sees Meg? È (Uny) -0P <А, 


е күк, 
Cy: Meit РУ күк? PRL y) "x Ф) < Aan, Sè, (47) 
Ky Ka KR 
Сз: Пак» eve» Ky SER, ae = Ue y) Y oy < Asn Sr. 


The best choice of À}, Аз, Аз, i.e. to give the smallest confidence regions, is not known and 
needs further investigation. From considerations of degrees of freedom we may consider 
Ак = Ак = Ag/K = À. 

5. EVALUATION OF SIGNIFICANCE LEVEL 
To determine A from (4:6) we have 

кА KÀ KÀ aj? aye» Qh 

А MI PRA ЫА ae үа 5 
Qi, Kg Ks; nof’ as as 4б LG, EG, + Ge rro 1-a, (5-1) 


where a is the significance level. Put ё = G, fs = 6.1(1+ G3), ty = G„/(1 + @, + Ga), then the 
above becomes 


Pe P" «A fer De, [eae HM dt, [AHAH i“ dt, 
(ski л), (ттен (нәјә анион dt 
(5-2) 


which сап be evaluated by successive numerical integration and using Pearson’s Tables of 


Incomplete B-functions. { 
For any confidence coefficient 1— o. the calculation of A depends upon calculations of 


iterated integrals, and tables have to be prepared for these. One difficulty of preparing such 


tables is that the integrals depend upon the parameters Kı, Ks, Кз; Ne etc., and thus unless 
the number of parameters can be reduced, construction of tables would be difficult. We may, 


however, get an upper bound for the significance level æ, from an inequality of Kimball 
(1951) given below: 


Pr (| 8, | £A, | Gs] <А, || <As}>Pr {| G, | <A} Pr {| Ga | <A} Pr{| 6, | < Аз}. (5:3) 

Here we choose A, Ag, Аз 80 that 
Pr(|G,| <A} = Pr {| Ga | <A} = Рг{] 6 | <А). (5-4) 
Thus if Pr (| G, | < А, | G3] $2; | Ө | < Àa} = 0:95 we have to make Рг{| @, | <A,} = 0:983. 


K 
Now n,G, has an F-distribution with кү and n, d.f. so that Ay = * F.(k,, ny), where F,(«, ne) 


is the 2 % point of the F-distribution with d.f. (К, n,). We thus have 


к, EEA 
Prf G, | <2 E (n). | Gal € “2 Fs; 19,1< n, er (Kes »)] E esa ci don 
Ne e ' 


Thus (ај. (5:5) 


446 Simultaneous tests of linear hypotheses 


Case II. We now consider a set M, which consists of more than one group of linear fune- 
tions, say it contains linear functions corresponding to the hypotheses H, and H,. We shall 
call this a compound set. In this case the linear functions belonging to H, and H, are non- 
' orthogonal. We shall show that there is no non-singular linear transformation by which 
these linear functions can be transformed into mutually orthogonal sets corresponding to 
hypotheses H, and H, respectively. Thus the basic inequality (5-3) is not directly applicable, 

Let Uj, ..., U: О, 3, +++; U,,,,, be an orthonormal basis of the vector space formed by 
Y, S Fe Ур зк S0 that Uj, ..., U, isa basis of the vector space formed by Y,, EC 
As before, let E((U;, y)} = Ф, then Ф, are linear functions of 7,,...,7,.; 7 


кү” Uxpkb ttt] at 


Let Y; be the normalized vector corresponding to У,, i.e. Ў, = ra , where | Y, | is the norm of 
i 


the vector Y, and let 7, = TE] . The relation between U-vectors and Y-vectors is expressed 
i 


b 
&)-6 8). E 


where a = (2; ;) is a (кү x ку) matrix, y = (Yiz) is a (Ka x Ky) matrix, § = (f;;) is а (ka X ky) 


matrix, FF is a (1 x ку) matrix with components Ў|, E ie Ў is a (1 x kg) matrix with com- 


ponents P o ок and similarly U is a (1 x кү) matrix with components U, ..., Uy ete. 
From the choice of the basis Uj, ..., Ша; О» Uus it is clear that the matrices x and y 
are non-singular and / + 0, since the vectors Y,, ..., Y, andY, 44, ..., У, ,,, are non-orthogonal. 


Inverting the equation (5:6) we get 


(o) = (pes, узд): en 


Replacing the vectors Ӯ, by 7, and U; by ®,, the corresponding relation must hold between 
the Ф” and the zs. Hence 95,,..., P„ depend upon 75, +, alone but 6, -es Pata 
cannot be expressed solely in terms of 7, |, ..., 7i, as Since (y-18a-1) + 0. Thus the sets of 
linear functions U, ..., О, and Upin -+e U, з, do not provide quasi-independent tests of 
H, and H,. We shall see later in II (c) that this can be done when the transformation is sin- 
gular, i.e. when we chose a suitable subset of. Ka — кү vectors from the vectors Ugros Uara 

We shall also consider two other methods of setting inequalities to reduce the problem to 
orthogonal sets. The essence of these methods is to cover a complicated region О by a sphere 
S and using the probability of the sphere for the region C as an upper bound. 


Method (а). Method of simultaneous confidence intervals for all contrasts 
This method is due to Scheffe (1953), Tukey (1952) and Roy & Bose (1953). It can be used 
for the simultaneous test but it also gives the cohfidence intervals for all linear functions of 
the parameters л, ..., Tey} Mey +19 +++ TT, „x, Simultaneously with a given confidence coefficient. 
In equation (5:6) since both U’s and ^s are normalized vectors 


\ 


Kı 
PET =1 (n=1,...,x), 


Ky к, (5-8) 
Eft X Yh =1 (^n21,..,4) 

Also (Йу) = Жану) (6 1,9, .... x) 
i (5:9) 


arn) = ХВ) у Uny) (m 12... xg) 


ва '——————————J ——————————  "Q'—P'— 


M. N. бнозн 447 
and similar equations hold between zs and O's. From Schwartz's inequality 


(f, - 2; e (0,9) - OF (n = 1,2, ..., K1 + Ky). 
Hence Pr( Fri -8,| <8 (e = 1. n tg) > Pr E KO, OP <a. (5-10) 
-1 


pts 
For the set M, we now use X? 25 = [00,,у) – Фі (4-7). This method has the advantage 
j=1 


that if z is a linear function of 7s, then at the same time, with the same confidence 
coefficient, we have the confidence interval for л, 


"n Aiñ + nee Fertig ep eng? | (5-11) 
OT, + 1.4 Onan Tas E020). 
Method (b) 
From the equation (5-6) we have f, = fU, Uy. (5:12) 


Since the matrix y is non-singular 
y = УИ + Un. (5-13) 


Let y-1Y,, = Zm, which is a column vector with components z,, ,3, +++» 2x +x, ANd let 2,3, +++ 
2,„,,„, be the corresponding normalized vectors, then 


(uas у) = Bi (Gr у) +... + fL US 9) + (Usa y) | 


i d : B (5-14) 
(Žare? y) = Br, (Us y) +... fas Us y) +у (Ок у), 


во that 5 pi +y = 1 for n= кү+1,...,Кку+ К, and similar equations hold between the 
ї=1 у А 
expected values y; = Ё(д„ у) and Ф, = E(U;, y). Hence from Schwartz’s inequality 


Zn» Y)— Š (Uy) 0: [(0,9)- Ф, (т=кү+1,...к 4 Ky), 
[n Y) "< XU y) - D+ [Un 9) 1 1Ka T 


Kı FKa Kı күк 
Ў t, Ves EIU) -Ф+ È Or Ф 
1 n= n= 


п=ку+ 


Since {г„} are linear functions of T aso Tau 2 EG y) (07 fit 1,...) are уна 
functions of parameters 7,, +1» ++ Лкү+ку› ® confidence region for these parameters is given by 


Ki кука : 
CE ense Sov) OP, 2, 01-0746 (5-16) 


while a confidence region for the parameters My, «+» Mks is given by 


$ [Tuy - OP «G- (5-17) 
n=1 


When there are more than two hypotheses in the set №, e.g. if there are three hypotheses 
H,, Hy, Н, with the groups of parameters лу,...›7,; Tini ў 
with best linear estimates Y,, -.- Yr, Yaga Oro У, чкі tio ett 


ракаў Пака °°? кука 
we use the same 


D 
Li 


448 Simultaneous tests of linear hypotheses 
method as before for the hypothesis H, and by combining the hypotheses Н, and H,. We then 
have the following confidence regions for the three groups of parameters: 

(т, т): X Uny) OP «6, 


Ks 


Ute ifs g ce, eny) —thaP< 2 [(U5,y)- OP + ў KU y) - OP < Oa, 


{=к,+ 
Kı Katka кү+к, Ki FKa KR Ж 
(T, ++» "эу канака): x [(2„› Y)- ml < Ks > [(U;, y) = OP + > [(U; y) 13 OP «6. 
n=x, +r +l i=1 tan teat 
(5-18) 
For the test of the hypotheses 
H: 7,—0, ..., п, = 0 
Fh: SO 25 Tek, = 0 
Ea: Tare = 9 ..., Тек = 0 
we consider is rà (Uny), F = xil S2), 
кука 
M-,E (00% Eos. e 
Ki thy tks 
x= У py), Fy = (kS), 
{=кү+к,+1 


Since Ф, = 0 on the given hypotheses, for all i = 1,2, ..., Ky Ka Ks. If the «% point of 
Fı, Fs, Е, (corresponding to @,, Gy, G) are бү, є, Єз, then we consider the significance limits 


1 1 1 
of = x. s XL a Xã for the test of hypotheses H,, Н, Н, respectively as n, S2e,, n, 5,6, +63) 
and „$$ (күєү + кє, + 65). The upper bound of the significance level is then ж. 


Method (c) 

Let U,,...,U,; U an U. be an orthonormal basis of the space formed by the 
vectors F, ..., Y; Yours ees e+, 80 that 0, ..., U, is a basis of the space of the vectors 
Y,,...,¥,,. We shall show that when Kı < Ks, it is possible to find orthogonal sets of к; and 
Ka — K; linear functions for the hypothesis H, and H,,so that the inequality (5-3) could be used. 

Tn (5:6), since the rank of f! is at most ку, by a transformation of the vectors г. 
Ў ^T, us the matrix f can be reduced to the triangular form with Kg — x, rows of zeros, 
i.e. the transformed vectors T Ee diss Y; +x, would be given by 


Ya = AGUA Uat. +p Ua tYnU.a +...+Y aU, 


: күк! 
Fara = Bis Us + ...+ Bic Ue +910 +... + Vie Veter 
; jas RN etd dirimere RE acne ase (5-20) 
2+1 = Укчу Оча +... Ук, ка Шан» 
Y, * У а +... EY Шан 
Let W,,..., We, form an orthonormal basis of the vector space of Y. Sas «2 Vader D 


then have tL 
EHD} = Xm = Ф, Gris ng (5:21) 


М. N. GHOSH 449 


The hypothesis Hg T4720, e, Mase, = 0, 
implies Hy: 0,420, ., Ona =O, 
since Ф, 21, ..., Ф, + are linear functions of m., +1 ...,7,,,,, and have the rank ку-ку. Now 


Н, (i.e. rejection of Нз) implies Н, (i.e. rejection of H,). We therefore test for the hypothesis 
Н; and reject H, only when H; is rejected. A confidence region for the parameters л, а, ..., 


к-к, 
7,,..,, Ì8 obtained from XÀ [(W, y) — Drs]? which is distributed independently of 


È (Uy) - Oa, 
i=l 


and these sums of squares may be used in equations (5-3) and (4-7). 


6. RELATIVE MERITS OF METHODS (а), (b) AND (с) 


The method (б) obviously gives a much closer inequality than (a), although the latter has 
the additional advantage that it gives confidence intervals for all contrasts of the para- 
meters in the hypotheses H, and Hp, in the set M, However, in most practical cases, the 
parameters entering into the hypotheses Н, and H, relate to quite distinct characters, and 
thus there would not, usually, be any need to consider contrasts involving both sets of 
parameters. 

In method (c) we use orthogonal sets of linear functions ‚АИ bigs Woo ШИН and form 
х2 with ку and къ ку d.f. from these instead of using ys with x, and x, d.f. from the linear 
functions Y, ..., Yi; Yap Fra, Which would have given quasi-independent tests for 
these hypotheses. The loss of degrees of freedom is serious when к — x, issmall, as the F-table 
shows. Thus the method (c) gives a good result only when x; — к; is not too small. 

On the other hand, the method (b) gives a wide inequality unless кү is small. It is not 
possible, however, to make a simple quantitative statement about the relative advantages 
of the methods (6) and (c) without detailed numerical caleulations. One could, however, 


follow a tentative rule for the use of these methods: 


кү small Ka — Kı small use (b), 
x, smallk;— ку not small use (b) or (c), 
ку not small Ka — Kı small use (b) or (c), 


x, not small Kg —K, not small use (c). 


analysis of the data referred to in the text and in 
discussions with Dr Hughes Bryan and Dr B. G. Greenberg. The numerical details of the 
caleulations will be published in a subsequent paper. The author wants to put on record 


his appreciation of helpful discussions with Dr Hughes Bryan and Dr B. G. Greenberg, 


of the University of North Carolina, and of help from the editorial board of Biometrika in 


improving the presentation of this paper. 


REFERENCES 
Kmart, А. W. (1951). On dependent tests of significance in the analysis of variance. Ann. Math. 


Statist. 22, 600 Н 

x - 22, 600. А А . Calcutta Statist. Ass. 3, 103. : 
Naxpr, Н. К. (1951). On the analysis of variance test. рш imati i 
КО КЫ ы ы S Ан С. (1953). Simultaneous confidence interval estimation. Ann. Math. Statist. 


24, 513. 1 ; i | БОЕ 
Scugrrf, Н. (1953). A method for judging all contrasts їп the жау» Телен baie 
Tuxey, J. (1952). Allowances for various types of error аа tare dress, 

burg meeting of the Institute of Mathematical Statistics, Mare’ 3 


The idea of this paper arose in the 


LI 
* 


[ 450 ] 


RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS 


III. SOME LARGE-SAMPLE RESULTS ON ESTIMATION AN D POWER 
FOR A METHOD OF PAIRED COMPARISONS* 


By RALPH ALLAN BRADLEY 
Virginia Agricultural Experiment Station of the Virginia Polytechnic Institute 


I. INTRODUCTION 

1-1. Rank analysis of incomplete block designs 
A new method for paired comparisons was discussed in two recent papers (Bradley & Terry, 
1952; Terry, Bradley & Davis, 1952). A mathematical niodel was postulated and tests of 
significance were developed. The procedures were considered as special cases of a rank 
analysis of incomplete block designs, and many of the concepts may be extended from the 
limited considerations of paired comparisons to ranking in incomplete block designs with 
two or more treatments in an incomplete block. The appropriateness of the model for paired 
comparisons has been discussed by Hopkins (1954) and also by Bradley (1954a). A large 
section of tables for a test of significance on the equality of treatment effects was included 
in the first reference cited and additional tables were prepared by the present author (19540). 
For all of the tests considered in the above-referenced work, large-sample approximations to 
the sampling distributions of the statistics, under the conditions of the null hypotheses, 
have been given. 


parisons in order to outline the objectives of this paper. 

We postulated true treatment ratings or parameters, 7,...,7, for t treatments in an 
experiment involving paired comparisons. It was assumed that every 7;> 0, and, for 
convenience, that} > п; = 1. Further definition followed with the assumption that, when 


treatment i appears with treatment j in a block, the probability that treatment i obtains 
the higher rating (or a rank of 1) is ;/(7; 4-77). Assuming independence in probability of 
treatment comparisons, we wrote the likelihood function in its general form as 


Lp nm, (1) 
i i<j 
n 
where a,  2n(t- 1) - X Яу. (2) 
j kml 
7: is the rank of treatment i in the comparison with treatment j in the kth of n repetitions 


of the paired comparisons design. 


* This project was supported by funds from the Research and Marketing Act of 1946, under Con- 
tract with the Agricultural Research Service, United States Department of Agriculture. 
+ X and П will indicate respectively sums and products with i= 1, ...,¢. X will mean that om 


i i 
* x i 
value of i that appears in the argument of the summation is omitted. П or II represent products 


хе э КӘ И i<j 49) , 
i=l, net j=l, t, <j or i+] respectively. Departures from these conventions will be specified: 


1-2. Review of the method of paired comparisons 
It is necessary to summarize the mathematical model of the method of paired com- 


RALPH ALLAN BRADLEY 451 


The method of maximum likelihood was used to obtain estimators, Py ..., Pp of the 
parameters, 7;,...,7, and likelihood ratio tests have been used throughout discussions 
based on the model set down. А general class of tests of the null hypothesis, 

H: т= 1 (61,4), 
against the alternative hypothesis, 


Н: m=mh) (h= 1,... m; i = 8 1+1,..., 5), 


where sọ = 0, 8, = t and S (в-а л) (h) = 1, was formulated. These аге, of course, tests 
hel 


of the indistinguishability of treatment effects on some attribute, perhaps colour or 
flavour, of the treatments. Two special cases were considered by Bradley & Terry (1952) 
and the specialization comes under H,. 

Case (i): The hypothesis H, becomes 

H,: Nom,is assumed equal to any 7; (i +j); 

that is, in H,, m = t. 

The normal equations resulting from the use of the method of maximum likelihood 
reduced to d 

Ры, 013, Ва (62 1,....0) (3) 


апа У ^, = 1. (4) 


In this paper we shall be particularly interested in the statistic* 7’ = —2InA,, where A, is 
the likelihood ratio for this special test. In this case 


T = nt(t—1)In2—2B,In 10, (5) 
with} B,-n Bog (pet Ps) - У Поре (6) 


Case (ii): Н, becomes 
H; m= (= 1,1558) п; = (1—67)/(t— 8) (6 = 8+1,...,0). 
The general hypothesis H, is thus restricted to the case in which m = 2. The estimator p 


of 7 was given as А x 
ns(4t1—8—3)-2 У У X "ук 
i=1 j k=1 (7) 


8 п E 
е7 (Е —9(2s—t Alo. 
ns(5st — 22 — 6s + 3t) — 2(2s )EL P ijk 


A test statistic and an approximate test procedure were set forth in the reference. It 
will not be necessary to review these methods in view of the remarks that follow in $II. 

Abelson & Bradley (1954) considered factorial arrangements of treatments imposed on 
the paired comparisons design. We shall not consider that situation in this paper. 

1-3. Objectives 
The objectives of this paper evolve from the need of considering the behaviour of the 


developed tests of significance and estimates of population parameters when the assump- 


tions of the null hypotheses may not be true. We shall be interested in the power of the test 


i ively. 
er indi mmon and natural logarithms respective! 
t By aes puse o for small samples. See Bradley & Terry (1952) and Bradley (19540). 


452 Rank analysis of incomplete block designs. 111 


In the initial reference, it has been shown that Т, in (5), under Hy, has the distribution of 
X? with (t— 1) degrees of freedom for large samples. The difficulty to be expected in attempting 
an exact evaluation of the power of the test based on Т' for small samples, even for very 
restricted sets of alternative values of the parameters, was also discussed. Accordingly, we 
limit our objectives here to a large-sample evaluation and comparison of power functions 
and to the estimation of variances and covariances using large-sample results, 

We shall show that Case (ii) yields the ‘sign test’, the properties of which are well known, 
and we may then limit our attention to Case (i). 


П. Case (ii) AND THE SIGN TEST 
2-1. Case (ii) 
We refer to H,: m; = 7 (i = 1, .,.,8); 7; = (1 —em)/(t— s) (i = s-- 1, ..., t) and the estimator p 


of 7 given in (7). It was not originally noted that the test procedure here reverts back to а 
binomial test. 


Comparisons of treatments in the first group of в treatments yield contributions to the 
sums of ranks in (7) of 3. Then, if X is defined to be the number of times a treatment of the 
first group ranks above (obtains rank 1) a treatment of the second group, 


NC 3ns(s—1 
à Y 2, па = ==) ics intl ыл (8) 
Substitution of (8) in (7) yields 
p = X[[(2s—1) X + ns(t— 51]. (9) 


Now, from the model for paired comparisons, the probability that a treatment i of the first 
group ranks above a treatment j of the second group in any of the э repetitions is 


US 7 __ m(t—8) 
P(rj, 71) = I on ^ 13 (1 3)g (10) 
"Жср 


(% = 1, -э8;)=8+Е1,...‚{; B= in) 
When we substitute the estimator p of 7 given in (9) in the form (10), we obtain the estimated 
bability, 
probability Est P(r, =1) = X/[ns(t—s)]. (11) 


There are ns(t—g) comparisons of treatments of the first group with treatments of the 
second group, and it is now apparent that this special case reduces to a consideration of 
ns(t—8) binomial trials. The test procedures reduce to those of the binomial or sign test. 


2:2, The sign test 


| 


2, 


* 


RALPH ALLAN BRADLEY 453 ` 


Dixon & Mood (1946) showed that the relative efficiency of the sign test in comparison 
with the ¢ test under assumptions suitable for the valid application of the latter test is 2/7. 
Later, Dixon (1953) prepared small-sample tables of power efficiencies that indicated that 
the sign test compares more favourably with the t-test than indicated by the asymptotic 
value, 2/7. 

In considering applications of Case (ii), it is preferable to go directly to the use of the sign 
test based on the ns(t—s) comparisons of treatments in the first group with those of the 
second group. It will usually be sufficient to estimate P(r;;, = 1) and not necessary to con- 
sider estimating 7 itself. If required, the variance of p, the estimator of 7, may be obtained 
from the variance of X in an approximation through the use of usual formulae for the 
variance of a ratio. 

We shall devote the remainder of this paper to a consideration of Case (i). 


III. ESTIMATION 
3-1. Asymptotic distribution of (L— 1) mazimum-likelihood estimators 
We shall require the large-sample distribution of the maximum-likelihood estimators, 
Di s Pp ОЁ Case (i) and their asymptotic variances and covariances. These results will be 
of interest in themselves and useful in the development of subsequent sections of this paper. 
Some new notation will assist in this discussion. Let 


fam) = Пт IT (0727 (12) 
i i<j 
where we use x and 7 to represent vectors, (2;, ...,%) and (т,..., m). Now 2; is the number of 


times treatment û obtains a ranking of unity in a repetition of paired comparisons. If x) 
is the observation on 2; in the kth of n repetitions, the association with a, of (1) is 


Tag R4 (13) 
к=1 
я bad B . 
The likelihood function Г in (1) is now IT. f (ug, т). Tt is also convenient to define 
UE SAESP al 
№ = 5 mmm? (i Y cob (м) 


А = —(л+т,)® (+9; 4,9 = 1, $55 b). 


We shall require the means, variances and covariances of ty, ..., ye, Let z;; bean indicator 
variate with the value unity if treatment i ranks above treatment j and zero otherwise. 
Then х= Day. Ziy is а binomial variate with expectation 7(7; +7), and variance 

i : 


min(m, 4-7). The variates ti making up the sum =; are independent in probability and 


it | 
E E(x) = п; Y (bm)! (i= 1,..) (15) 
Vlei) = т У m(rit m)? = mAn (1.0; (16) 

à 
and соу (z;,2;) = -mnm +n)? = mmAg (+j; bj = Lot) a7) 


454 Rank analysis of incomplete block designs. 111 

The parameters 7,,...,7, are not independent but subject to the restriction Ул; = 1, 
Accordingly we may regard py, ..., p, ,a8 uitia ‘likelihood estimators ofthe вра 
parameters, 71, ...,7, ,, taking 7, = 1- Em. Then J/n(p, —7,), ..., 4n (p, – п) have 


& joint limiting normal distribution with zero means subject to the verification of certain 
regularity conditions. The required regularity conditions are quite well known and given, 
for example, by Cramér (1946, $33-3, p. 500) and Chanda (1954), who fully states the con- 
ditions but for the continuous case. We have verified the conditions for the likelihood 
function L of (1) expressed in terms of f(x, 7) of (12), but these demonstrations are omitted 
for brevity. 

The dispersion matrix of the joint limiting normal distribution of our (1 — 1) estimators 
is the matrix [ДА] 71 with the definition of A, below. We need only note that 


Of a, m, , m , = quem 
ae at er (m+) UA (m7, (i =1,...,(¢—1)), (18) 


and, from (15), that 


Әл Amf] v, m (a; |= уе. 2 
Les ЛЕЛЕ 2 =À; (j = 1,...,(2—1)), 


thereby defining Aj;. It follows from (16) and (17) that 


Aij = Ag Аа Ag t Ag. (19) 
The matrix [A] is non-negative definite since it is a dispersion matrix and it is positive 
definite since z, ..., 2, ,, and hence oe E Pun are free of linear restrictions. 
1 1—1 


The conclusion, with the established validity of the necessary conditions, is that 


Am (py —7,), ..., An (ра — T1) have for large samples the multivariate normal distribution 
with zero means and dispersion matrix ГА. 


3:2. Variances and covariances of estimators 


In the preceding section, the parameter 7, was considered to be a function of the re- 
maining (t— 1) parameters. That process introduced asymmetry into the elements of the 


dispersion matrix, [A]. This lack of symmetry is essentially artificial and will now be 
removed, 


The t xt matrix, [A,,], is singular in view of the definitions (14) and the fact that 
TA + Уу пу = 0. (20) 
1 


Then, if the elements of the last row and then of the last column of the matrix [Ajj] are 
subtracted from corresponding elements of the remaining rows and columns respectively, 


| А, | А [Ai] [Аџ= А] 
[Ay c ^u] Au 
where [Ay—Ay]’ and [Ay—Ay] are respectively column and row vectors of (t— 1) elements. 


Tt follows that А 
| Asl a [44] [Au — А]! 
A-A] 1+2, 


= 0, 


(21) 


RALPH ALLAN BRADLEY 455 


Now, by reversing the process and adding the elements of the last row and then the last 
column to corresponding elements of the remaining rows and columns in (21), we have 


(1 [Ay +1] Пп] (Ag) Ol | 
= +1| = = — 
ee | 1 ш о! 
where [1] and [1]' are respectively row and column vectors of t unit elements. The last step 
again depends on the result that | A,, | = 0. In the same way it can be shown that the co- 
factor of A;; in the extreme right-hand side of (22) is equal to the cofactor of Aj, in | Ail. 
Then, if тү, is the covariance of {п (p, —7,) and Jn (p,—7,) (ij = 1, ..., (t7 1)), 


(22) 


cofactor of A,; in e d 
gum р m m (23) 
n] o 
But the formulation of the model for paired comparisons is symmetric in the parameters 
т, ...,7,and the estimators p;, ..., Pp and (23) applies for all variances and covariances with 


i,j = 1,...,¢ on the basis of its symmetry. 

The variances and covariances obtainable from (23) are simply the elements of the 
t-square principal minor of the inverse of the matrix of the determinant in the denominator 
of (23). For small values of t, this inverse matrix may be evaluated by elementary methods 
when 7, ...,7 are specified or replaced by estimates. However, since | A,;| = 0, the usual 
Doolittle methods of matrix inversion break down. Then for larger values of f, it appears 
desirable to invert [A4] specified through (19) using а Doolittle method, thus obtaining the 
required variances and covariances with i,j = 1, ...,(£— 1). The remaining variance and 
covariances associated with Jn (р, — т) are determined through the relationship 


1-1 
An (p=) = > У (P= 


t-1t-1 1-1 A 
во that o= Ў Eoy and y= -5% @ = 1,...,(#—1)). 
413-1 = 


Since it has been established in $ 3-1 that 4/n (p, = T1); -+ n (Pi-1— тл) have asymptotic- 
ally a multivariate normal distribution, and since Jn (~,—7) is a linear function of those 
variates, we may now state that , 

(n (py = т), <, n (p,— т) have, for large values of n, the singular multivariate normal 
distribution of (L— 1) dimensions in a space of t dimensions with zero means and dispersion 
matrix, [оу], defined through (23). 


In general, for large samples, we may take p,, ..., p, to be jointly normally distributed 


with means 7j, ...,7,, and dispersion matrix [0 Лт. Then апу linear function Dorp: may 


be taken to be normal for large samples with mean Eb, and variance Хоц. In 


particular, we may be interested in orthogonal linear comparisons of the sort often con- 


sidered in the analysis of variance. і i 
Estimated variances and covariances will usually be required. Through the consistenoy 
of maximum-likelihood estimators, we can define Ày (i,j d 1, ...,t) to be zm eon pes 
of p, ..., p, as А are of m, ..., T4 AS defined in (14). Then [@,,], the sample dispersion matrix, 
is the same function of 45 as [су] is of Aj; in definition (23). 


30 


Biom. 42 


456 Rank analysis of incomplete block designs. 111 


It has been shown (Bradley, 1953) that the functions of the parameters, In 7,, - Ing, 
determine in a sense location points for the ¢ treatments on an arithmetic scale. Conse- 
quently, there may be some interest in obtaining 


pP vu a 
сар, ору) тт, (5j = 1,...,0), (24) 
as given, for example, by Hald (1952). If common logarithms are used, a factor, (0:4343)2, 
is required in the right-hand member of (24). 


IV. LARGE-SAMPLE DISTRIBUTIONS OF 7 


41. T expanded in а series 


It has been shown (Bradley & Terry, 1952) that, under the conditions of the null hypo- 
thesis of Case (i) of $ 1-2, 7 as defined in (5) has, asymptotically with n, the ?-distribution 
with (£— 1) degrees of freedom. In order to investigate the power of the test procedure 
based on 7' and to consider the efficiency of the method in comparison with other test pro- 
cedures, we require the distribution of 7 under the alternative hypothesis Н, of Case (i). 
Tn this section we shall show that 7 has the same limiting distribution as a certain sum 
of squares. 


Let y= (n. ;) @ = 1, «sey t), (25) 
and note that from (2) a; = 4n(1+y,) x [1+4(; +90] (26) 
for all i, and Xa, = 3—1). (27) 

i 


Substitution in (5) through the use of (6) and upon reduction based on (26) and (27) yields 
T- n X (l+y)[In(1+y)) E [L- 39; 9)? - (rh [--3(;-9)]*. (28) 


It is next possible, by expanding the right-hand member of (28) in a power series in the 
variates, to show that 
: T = {nt У yi R(y;), (29) 


where R(y;) depends on higher powers of the variates than the second. In turn, it can be 
shown that 


| Ry) | < 057—8) | y; [+таҗ(#— 14) X edm - 2) У [и [P+ У | yc y, 

1 i<j 

2 
жи |+ ИТУ +)". (89) 
Уи] 1 +@ || up 41+10; - )] ( 
P result, (30), is obtained through algebraic manipulation involving the use of the general 
relations 
[а (1+а)– 2+ 4а | < |а|, (12-2)? = 1-z42%/(142), 

and the particular relations 


Y» =-y; and 2X, =- Zy. 


RALPH ALLAN BRADLEY 457 


Let us suppose that n y, (i = 1, ..., t) have limiting distributions and this will be demon- 
strated under desired conditions. Assuming this result, we may then state that ni-ty,, for 
any €» 0, converges in probability to zero, for it is an easily proved theorem in probability 
that, if (X y) represents a sequence of random variables with a limiting distribution function 
and if (æy) is a sequence of constants approaching zero as № > œ, then (xy X y) is a sequence 
of random variables converging in probability to zero as N —oo. It follows from Slutsky's 
theorem (Cramér, 1946, $20-6, p. 255) that, if yny; have limiting distributions, R(y;) 
converges stochastically to zero. This is sufficient (Cramér, 1946, § 20-6, р. 254) to state that 
T has the same limiting distribution as 1nt Y yi 


42. Limiting distribution of y, under H, 
'The normal equations for the maximum-likelihood estimators were given in (3) and (4), 
and these relations are useful in obtaining the limiting distribution of ny; defined in (25) 
under modified specifications of лт, ...,7,. From (3) or (26), we have 


ап = M1 y) ‘O ++ ( = 1,...,0. (31) 


Expansion in (31), using the negative binomial and the result that Уу у; = —y; and multi- 
plication by 4/n lets us write 4 
ш E Nm ag O ELT, | 
vn Е] = Hyny: 4n (л у) +в п yy) ( ) 
Now, from an argument similar to that used in § 4:1, if |n y; has а limiting distribution, it 
has the same limiting distribution as 


ЕГЕТ (33) 

Let us now redefine the parameters so that 
п; = م‎ (im ss (34 
where ĝ;„ represents a sequence of cone рте ig is jo pnn ae ^ 


the parameters is an artifice that permi e і der. 
and thence the power of the test procedure. This essentially means that we are investigating 


the asymptotie power for alternatives in the locality of the parameter point determined by 


the null hypothesis. This is necessary in order that we do not merely show that MU 
procedure is consistent. Under these conditions we require the limiting distribution of the 
variate defined in (33). 

Let 7, be the average over the » repe 
Then, through the definitions of x; and tiw and (13), 


а; = n Y Tij 


titions of the binomial variate, 2, defined in § 3-1. 
we have 
(35) 


1 Л, т, itting 2, i dent. 
and the variates, Zi -.. и omitting Zi, are indepen t i 
Since E(x) = түтүн) we may define фи(т) to be the characteristic function of 


2 Jn (2—13), and, following the method of Cramér (1946) in § 16-4 of his book, we have 


17} тех M -1 а 2 
| e a паар (36) 
$40) = (п; 2-7) д (m; 4-7) - 


458 Rank analysis of incomplete block designs. III 
Expansion of the exponentials in (36) yields 
T лн 
al > (37) 


ts los | Rs (m 
Galt) = [ H Gd k m 7 4-7; 

where у and Û are real or complex quantities less than unity in modulus. 

We now substitute from (34) for the parameters in (37) and write 


2 n 
$,(7) = fiaa- -a | y 


tr Ёт t 1 
where H = J-1 g 0 =0; -8jn +0) -4— 1 CA (02, —95,) [ + g Jn б + 27 
ex Totis aer] [ 955] 
and = sale? -y9)44 Jh ws TES Jn 


From the convergence of à;, to 6; and the forms of H and W, for any e, 7 > 0, there exists 
(6, 1) such that, when n» ny, | H | «€ and | W | <7 for fixed 7. It follows that 


exp [—37* (1-3) +} — ltr (9, 85) — e] sum ф.(т) 


<ехр[—{т%(1—)++/— ttr (0; —0;) +€], 
and hence lim $,(r) = exp[— Jr?-- 4 / — ltr (9, — 8,)]. (38) 


This result (38) is sufficient (see, for example, Cramér, 1946, § 10-4) to state that 2 n (z;; —3) 
has a normal limiting distribution with mean 260; — д) and unit variance. 

We may obtain the variances and covariances of 4,/n[a,/n—4(t—1)]/t through use of 
the relation (35). It is easy to show that 


4 2 
sya [5-1 = gE B4» @,—)], (39) 
and we note also that by definition z= 1 -Ty (40) 


The mean of the left-hand member of (39) in the limit is now seen to be У” (6,—6;) = ti 
j 
since Y ô; = —à;. Through the variance of 2 a/n (Ti 4) and the independence of the variates 


in the sum in (39), the limiting variance of the left-hand member of (39) is 4(t — 1)/t®. Z, and 
%q, are independent unless i = g and j = hor i = h and j = g. Then in view of (40), the co- 
variance of 44/n[a,/n — (t — 1)]/t, and 4 n [ат — 1(@— 1)]/t (044) is — 4t, It follows that 
4 An [а [т — (t — 1)]]t, ..., 4n [ay — 3(£— 1)]]t, and consequently /ny,, ..., my, have a joint 
limiting normal distribution with means t1, ..., tô, and equal variances and covariances 
4(t—1)/ and — 4], respectively under the conditions (34) on the parameters as n.— со. Since 
D y; = 0, the limiting distribution is necessarily singular. 

It is to be noted that this sume limiting distribution can apparently be obtained by using 
the definition (34) for the parameters and relying on the use of the joint li miting distribution 
of Jn (рут), s n(p,—m) developed in $3-2 above, If 7, is replaced by t-! 4-n-3;, in 
the definitions of A, in (14), and if Ais then replaced by lim A, in (23), the correct variances 

п->=ю 


and covariances would be obtained for Jn y,. ..., /ny,. That this procedure is valid has been 
proved by Wald (1943) but we have obtained the results directly. 


RALPH ALLAN BRADLEY 459 


4:3. Limiting distribution of T under H,: m, = (7! -n8,,. 
We define щ = Ет, (41) 
and, from § 4-1, T has the same limiting distribution ав У; uj. ш, ..., u, have a joint limiting 
1 


distribution that is multivariate normal witli means, }t44,, ..., 40, variances (t—1)/t 
and covariances —1/t. Further, Xu; = 0. 
1 


The Helmert transformation is used to transform to new variables, #,, ..., 2, with 2 = 0, 
and hence 7 has the same limiting distribution NO It is easily verified that z;, ..., 2,3 
are independent, have unit variances, and have limiting normal distributions under H, 
with means, say, 6,,...,0, 4, where i" - ie xs. This is sufficient to permit the final 
conclusion that ia and T have a limiting distribution under H,: ту = t1 4- n-40;, as п > 0, 


that is, the non-central x*-distribution with (t— 1) degrees of freedom and parameter of non- 
centrality, А = H° У; 83. А is defined if we reiterate that the limiting distribution function 
* . 
of T under H, is obtained by integrating the non-central X*-density, 
gÀT eth ә T3914 


KT) = = „ХГ 4] 29817 


(42) 


The power of the paired comparisons test based on T' is asymptotically given by 


f | a, t—1, 00) = | f(r)aT, (43) 

Xii ^ 
where д? , , is the a-level significance value of a central y?-distribution with (t— 1) degrees 
of mE Fix (1949) -g tables for a = 0-01 and а = 0-05 with A tabulated for 
given values of f and for given degrees of freedom. Pearson & Hartley (1951) provided 
charts for given degrees of freedom and æ = 0-01 and 0-05, whereon / is plotted against ф. 
These charts are for the non-central F-distribution but may be used for the non-central 
j?-distribution if the denominator degrees of freedom of F are taken to оми бэя 
notation (А | о, v, v,) was used by Pearson & Hartley and has been modified for our specia 


case in (48). ¢ = МАЈ) = iN ài. 9 


Tang (1938) presented tables based on ф that may also be used to evaluate the нем ч 
For large n, an adequate approximation to the power of the paired PERNA: on N 
be obtained by assuming the non-central y?-distribution and taking the parameter 


A = qn X, (7; — 10). (45) 
i 


: " x h 

When Н, is true, л; = 1/tand д = 8; 20( = 1, ssl ipio dee uus У s 
the central x2-distribution with (4—1) degrees of freedom for oaa sib 

establishes the approximate large-sample distribution for the test of H, given by y 


& Terry (1952). 


460 Rank analysis of incomplete block designs. III 


V. COMPARISONS OF POWERS OF TESTS 
5:1. Comparison of the test for paired comparisons with a multi-binomial test 
One of the chief uses of an investigation of the power of a statistical test is an assessment 
of the merits of the test in contrast with alternative procedures. In this and the following 
section we shall compare the test based on 7’ with two other possible methods. 

The model formulated for the method of paired comparisons is not the most general one 
of its form possible.We could postulate parameters Tij the probability of treatment i being 
rated above treatment j in the comparison of these two treatments. Then the n repetitions 
of the paired comparisons design could be regarded as }t(t—1) unrelated binomial tests, 
each test depending on n trials. 

Let us consider the comparison of treatment i with treatment j. The usual test of the null 
hypothesis, 7;; = 3, is based on the statistic, 

Ву = 4n(p,— })?, (46) 
where p; is the estimator of лу. Further, 
S; = 2n(py — 4)? + 2n(pj, — 4)? = ud, + uj; 
and u,, and u;; correspond to variates v, defined in (41). This binomial test is only a special 
case, t = 2, of the paired comparisons method being discussed in this paper. Then, under 


the null hypothesis, S;; has a limiting y*-distribution with 1 degree of freedom. (This result 
is of course well known.) Further, if the alternative hypothesis is expressed as 


Tij = B+ tus] in 
with Zijn converging to Mij ав n> 00, S;, has a limiting non-central y?-distribution with 
1 degree of freedom and parameter of non-centrality, A;; = 2(42,+3,), 7;; = 1—7,, and 
Lij = — My. The distribution of S;, under the alternative hypothesis follows as a special 
case of the theory developed for paired comparisons. 
Suppose that the model for paired comparisons is appropriate. Then, m; = 7;/(7; 4-7) 
and, if 7; = t - n-33,, as before, 


NN 2 B. s t T 
Шр = (0n —8,,) — & Jn Sos +0.) + 8n (; + Sa) (Bin +4 jn)" |. + 2n (Sin + 8] . 
It follows at once that My = 00,8), (47) 
and Ау = 1123,95. (48) 


з If an overall test of Hy, the equality of all treatment parameters, here called the multi- 
binomial test, were made on the basis of the 3t(L— 1) independent sets of binomial trials, 
the appropriate statisti 

pprop tistic would be 8-8, (49) 
i<j 


From the foregoing argument and the additive property of the y?-distribution, under Hp, 
S has the central y*-distribution with }4(t— 1) degrees of freedom and, under H,, S has the 
non-central y*-distribution with the same degrees of freedom and parameter of non- 


trality, 
SE X = 0) = DP =A, ye 
«j i 


in the limit with n. The simplification in (50) is possible since Y д, = 0. 
٤ 


—— à ————-— =ч :ЄҤЩЁ Щ ше е е — —+— У 


RALPH ALLAN BRADLEY s 461 


The paired comparisons test and the multi-binomial test under the conditions appropriate 
to the former have asymptotic power functions that differ only in the numbers of degrees 
of freedom that are available. The asymptotic powers of the two tests are clearly different,» 
In Figs. 1 and 2 we have plotted the power / against A for t = 3 and t = 4 for both tests 
based on both 0-05 and 0-01 levels of significance. The #-вса1е is logarithmic and the power 
curves were obtained from similar charts given by Pearson & Hartley (1951). The horizontal 
scales on our charts are for A, the element common to the parameters of the test considered. 
To use the Pearson & Hartley charts we computed ф = J(A/t) for the paired comparisons 
test as given in (44) and we required 


4 À 
? -Jum (51) 


for the multi-binomial test. It is to be noted that, when A = 0, 2 = 0-05 or 0-01 for both 
tests. Further, as one would expect, 2-1 as А оо for both tests. In the intervening 
A-region, the T'-test is more powerful than the multi-binomial test, and there is an indica- 
tion that the advantage of the paired comparisons test increases sharply with t, the number 
of treatments. 

Power curves for an analysis-of-variance test are also given in Figs. 1 and 2. The com- 
parison of paired comparisons with analysis of variance is discussed in the next section. 


5-2. Comparison of the test for paired comparisons with the analysis of variance 
A correspondence between the parameters of the method of paired comparisons and those 
of analysis of variance may be established by referring to the discussion of Thurstone's 
model for paired comparisons given by Mosteller (1951). Mosteller summarizes the Thur- 
stone model by writing* 


P(X,>X;) = eh dy, (52) 


1 © 
Ул) | A 
where S; is the location point for the variate X, on a subjective continuum and 03 is the 
variance of the difference (X,— X;). It was assumed that the variance of X; is о? and the 
covariance of X; and X; is po*. Then са = 20°(1—p). If X; and X, can be observed in paired 
comparisons, the usual additive model of the analysis of variance would require that p — 0, 
S; be the effect of treatment û, and g? be the common variance of the random elements of 
that additive model. For our purposes, the Thurstone model may more generally be 
associated with any analysis of variance with the same variance а? for the random element, 
of its additive model and with S; as the effect of treatment 1. j | 

The present author (1958) has shown that, for the method of paired comparisons here 


discussed, one can write 


1 ao 
зт; P m sech? фуйу = — 
P(X;» Xy) - Tm; d aem. у 4а) — aan z,-1n 2) 


is required to compare the method of paired com- 
the integrand in (52) and (53) do 


sech? 2. dy. (53) 


A correspondence between S; and 7; 
parisons with the analysis of variance. That the forms of 
not matter for large-sample sizes can be demonstrated. 


* Mosteller indicated in his discussion that no generality was lost by taking т = 1 and he wrote 
(52) with that restriction. 


| 


$ 
& 
с 
S 
бу 
— ^ © 
X Bons M: 
А а T es 
© cs ЖКУ 225 
P “ФСГ є Ss 
© — rv 
= ES à EE 
зз 8 M 8 
3 £^ $ ^ © 
а din & 
€ uc HT A B 
Aa 3 CNC МУМ" 
sy E 8 
$ A3 H БААУ 
m x4 ЖА 
$ * MT ioe 
= E АА T 
= RM SC E 
З 8 ^ & 
EN à = 
CHER H tin STA 
i P. ТЫП 
ro 58 УН L| 
a E - ТТЫ 
НЕ. f ЖЕ ЕЕ 
- = 3 ae, 
= Ж 
giil 
$ ЁЗ 


4. 
‚ вее Cochran (1937). The result is used by Dixon & Mood 


test. 


Fig. 2. Asymptotic Me functions: ¢ 


* For a derivation when f(y) is normal 


(1946) in discussing the statistical sign 


Ж, ТГ” уч. 


т=з ыс — ешш бл 


RALPH ALLAN BRADLEY x ш 
) 


V(c,) does not depend on the form of f(y) but only on the ordinate f( — ‚же 
establish a correspondence between 5, and л, when we adjust the scale in (53) 
so that the ordinate is equal to 1/,/(2m). The ordinate in both (52) and (53) is evaluated at 
y = 0 since от; Јалу = O(n) because of the definition, m, = t+ я-1д„. It follows that 


а = (um) (54) 
and ,/(47) In r, corresponds to Sifa a. 

If in the Thurstone model we assume that р = 0 as required for the association with 
analysis of variance, 4/2 S,/c4 represents the location point (or treatment effect) for the 
ith treatment on a subjective continuum (or additive scale) whereon the variance of an 
observation is unity. In terms of the parameters of the method of paired comparisons, the 
location point is (фт) оту. If n(t— 1) observations are made on each variate X, in the 
analysis of variance (this corresponds to the number of times a treatment appears in the 
method of paired comparisons), the parameter of non-centrality for the F-test would be 
approximately 

p= MENS (inj inn). (55) 
i j 
For large samples the non-central F-distribution approaches the distribution of the non- 
central x? with (t— 1) degrees of freedom and АА", where 


A = pret- DLA = alt 0Af (56) 


; 2 ine = 


in view of (34) and (50). 

The power functions for analysis of variance are plotted in Figs. 1 and 2. To plot these 
curves, we computed ¢" = «/[m(t—1)A/@] in order to use the Pearson & Hartley charts. 
It is clear that analysis of variance is superior to paired comparisons and the multi-binomial 
procedure for t = 3 and 4 and the advantage increases with t. On the other hand, for com- 
parative purpose, we have assumed that conditions for the valid application of analysis of 
variance could be attained while the method of paired comparisons was devised for situa- 
tions where the analysis of variance may not be used. 


5:3. Relative efficiencies 

The relative efficiency of one test to another may be obtained from the limiting ratio of 
sample sizes required forequal powers. The concept depends on local properties of the power 
functions of the two tests being compared. Relative efficiency is the limit of the inverse 
ratio of sample sizes required in two tests in order that the tests have equal powers for 
alternatives approaching, with increasing sample sizes, parameter values specified by the 
null hypothesis. The methods employed in this section parallel those developed by Pitman 
(1948). We shall first obtain the asymptotic relative efficiency of paired comparisons to 
Then, to complete the inter-comparisons, we 


analysis of variance by elementary means. 
use : more general theorem of Pitman, as presented and extended by Noether ( 1955), to 
obtain the relative efficiencies of the multi-binomial procedure to paired comparisons and 


to the analysis of variance. ^ : 
Let n" ko thd same size for the analysis of variance and n the sample size for paired 


comparisons. For comparable treatment differences with the ratio n"[n fixed, we need 
®, corresponding to ;, of (34) such that 

Sin 
lim zz =1. (57) 


n.n +a i n 


464 Rank analysis of incomplete block designs. 111 
Now we have already seen (§ 4-3) that for the method of paired comparisons, 
A = 1870 = lim {я у 0j, [n, 
i no i 
and we can write, from (56), 
A" = lim }rn"t*(t—1) X din". 
n> i 
The test situations are comparable when (57) holds and the asymptotic powers are identical 
when А = А”. Consequently the ratio of sample sizes for equal powers is asymptotically 
аздай 
п, п" FP (ё 1)’ 


(58) 


and this is the asymptotic relative efficiency of the method of paired comparisons to the 
analysis of variance. 

The theorems of Pitman and Noether yield relative efficiencies in terms of the efficacies 
of the tests being compared. For a statistic Ty based on a sample of size N, define 


E(Ty) = Vw(0; Ty) and var (Ty) = с(Ө;Т,). 
The efficacy of the test of H,: 0 = 0, against the alternative H: 0 > 0, is 
V RR (05; Ty) = (J (05; Ty) nlo; Туур" (59) 


where бу = 0,-- k[N?. It is assumed that a particular alternative 0 = 0, changes with the 
sample size N in such a way that lim Oy = 6,. 
N>% 


We have three tests of the hypothesis H,: л; = 1/t with the alternative, 
Ha: m= nd бы->б, as n> (i=1,...,t). 


The statistics, T, for the method of paired comparisons, S, for the multi-binomial procedure, 


and Р, for the analysis of variance, all have asymptotically non-central y?-distributions 
with parameters, : 


A = lim }nt*0,,, 2’ = lim }nt90,, А” = lim Jnat*(t—1)0,, (60) 
n> o no п-> о 
and degrees of freedom, (t— 1), 3—1), (2—1) respectively. In (60) 
On = Xn. (61) 
i 


Thesubscripts were added to thestatistics 7, S and Finorder to provide a notation consistent 
with the general definition (59) and to emphasize the dependence of the statistics on n. 
The hypothesis H, is true when 0, = 0, = 0. We may take ү „(#; Т), V, (0; 8„) and 
¥,(0; Fn), corresponding to yry(0; Ty) in (59), respectively to be* 
(0—1) + 2080, q(t-1)--]n80 and (t- 1) + IEnzt*(t — 1) 0. 
Differentiation with respect to 0 yields (for all values of 0 including 0 — 0) 
Pal Т„) = фм, (0; S,) = фз and 4(0; Fp) = фт 1). (62) 


* Our values of y, and o5, the latter given in (63), are not the means and variances of the statistics 
for finite n but rather were calculated from the limiting distributions, That this is permissible under 
conditions which may be verified for our tests was noted by Noether. 


RALPH ALLAN BRADLEY 465 
From the non-central y*-distributions, 
o2(0; T,) = 2(t-1) 02(0; $,) = (0-1) and e$(0; F,) = 2(t— 1). (63) 
The efficacies of the three tests are, with д = 1, m = 1 in (59) in each case, 


n? 
J(324— 1)]” 


А пі? t-1 
R,(0; T,) — R,(0; Sa) = cii and R,„(0; E) = we | 3 (04) 


The relative efficiencies of the three tests taken in pairs are given by the limits of the ratios 
of corresponding values of А, subject to conditions set forth in the reference and which 
may be verified here. The relative efficiencies of 8 to T and to F are 


А „8,00; 8„) _ (2 
R.E. (S to T) = АШ Ё„(0; T.) = (;) (65) 


= lim Pal: 5) (20). 6 
and R.E. (S to F) = a R,0; Ё,) EH: (66) 
Table 1. Asymptotic relative efficiencies 
(Т, paired comparisons; S, multi-binomial; F, analysis of variance) 


ers |. 
| | | 
T to F 63-7 47-7 42-4 39-8 38-2 | 371 36-4 35-8 35-4 31:8 
| StoT 100-0 81-7 70-7 632 | 57-7 | 535 | 500 47-1 447 


| StoF | 63:7 39-0 | 29-9 


25-1. | 220 | 197 | 181 | 169 | 159 
| | | 


The result obtained more directly for the relative efficiency of T to F may also be obtained 
using values given in (64). Asymptotic relative efficiencies expressed as percentages for 


values of t from 2 to 10 are given in Table 1. 

When t = 2, the method of paired comparisons reduced to the statistical sign test and the 
relative efficiency (58) is 2/7 as obtained by Dixon & Mood. The method of paired compari- 
sons becomes more and more inefficient relative to the analysis of variance ast increases. 
This was to be expected in view of the work of Dixon & Mood for the sign test, and since the 
comparison was made for a situation in which the analysis of variance could appropriately 
be used. The assumptions of the analysis of variance are invalidated in much of the work on 
sensory difference and subjective testing, and it was to avoid those restrictive assumptions 
that the method of paired comparisons was developed. Further, although for ¢ = 2, че 
tabled relative efficiency of T to F is only 63-7 %, Walsh (1946) has shown that, for see 
of size n = 4, 5 and 6, the sign test is approximately 95 % as efficient as the а t- ч 
and, although the relative efficiency decreases as ?t increases, itis approximate y : = when 
n = 13. Dixon & Mood (1946) and Dixon (1953) also indicate that the efficiency of the sign 


test is better for smaller sample sizes. Tt seems safe to assume that this situation also holds 


for {> 2 in paired comparisons. г ? 7 "T 
Тһе АНШЫ АШ procedure is seen to have rapidly Soorten he ЧК ЕНЕ is 
t increases in comparison with both the method of paired comparisons and the analy! 


of variance. 


466 Rank analysis of incomplete block designs. III 


VI. ILLUSTRATIVE EXAMPLES 
6-1. Estimated variances and covariances in a numerical example 


We shall use the data from the taste-testing experiment on the flavour characteristics of 
pork roasts as given as an example by Bradley & Terry (1952). We now assume that the data 
for the two judges may be pooled. The pork roasts were obtained from hogs fed corn rations 
and peanut supplements C, Cp and CP. The experiment, with the results of the two judges 
pooled, yielded the set of sums of ranks, 32, 28 and 30, respectively, for the rations while 
т = 10. The use of the tables in that paper showed the following values: 


Zn Dra Ir Pı Do Ps Prob. 
32 28 30 0-24 0:43 0-32 0-63 


Thisresult is clearly non-significant at any realistic level of significance and is not indicative 
of any real ration effect on the flavours of the roasts. 

We now obtain estimates of the variances and covariances of 4/n р;, np, and npg or 
An (р, т), Мт (Pa— тз) and ./n(ps—73). The first computing step is to obtain values 
of Aj; from substitution of values of p, for 7; in (14). Then, 

Au = 8:243, Ag 2500, 
Ам = —2:228, -An =—1-778, 
Aa == 3180, Au = 4780. 


The estimate of the determinant in the denominator of (23) is 
8.243 —2.228 —3:189 1 


— 2-228 2:566 —1:778 1 
= — 154-976. 
— 3:189 —1-778 4:780 1 
1 1 1 0 
Now, for example, from (23), 
i —2228 —1778 1 
T = 154-976 —3-189 4-780 1 |= — 0:0485, 
1 1 0 


and similarly, the complete set of estimated variances and covariances is 
G1, = 00703, 6, = 01252, 
ia = — 0۰0485, Ca = — 0:0767, 
G13 = - 0:0218, Oy, = 00985, 
The estimated variances and covariances of ру, p, and p, may be obtained by dividing 
those immediately above by n = 10. Consequently, the estimated standard errors of ру, 


р» and р; are respectively given approximately by 0-084, 0:112 and 0-099. 
A check on the computing may be made by computing the variance of Jn Y, p, in terms 
1 


of the variances and covariances of the elements of this sum. The result of course should 
be zero. 


7 


— —-— ""—————á 


RALPH ALLAN BRADLEY 467 


6-2. Use of the power function 
(i) Suppose that the true values of the parameters л, 7, and лу in the example of the 
preceding section are respectively 0-28, 0-59 and 0-13. (If these were estimates with n 10, 
the significance level of the test would be 0-04.) We use д,„ defined in (34) аз an approxi- 
mation to д; and from (45) and (44) we have the approximate values 


А = 7-43 and ¢= 1:57. 


The approximate power of a 0-05-level test, obtained from Tang's tables* with h"? 
fa = со, & = 0-05, is (1-43 | 0-05, 2, 00) = 0-67 and with a = 0-01, A(7-43| 0-01, 2,00) = 0-56 
in the notation of (43). If we are interested in the probability of failing to reject H, when 
Н, is true, we require 1—2. For ж = 0-05, the probability of a type П error is 0-33 and, for 
a = 0:01, the probability of a type П error is 0-44. 

(ii) Suppose that the differences observed in the estimates for the pork roast experiment 
are of practical importance (as distinct from statistical significance) and that for a 0-05-level 
test we desire the sample size for a second follow-up experiment such that Ё = 0:95 or such 
that the probability of a type II error will be about 0-05. Without specifying n, we take the 
values of лу, Ma and л; to be 0-24, 0-43 and 0-32 as estimated in the experiment. Then from 
(45), A = 0:123n and ¢ = ,/(0-123n/3). 

To determine n, we enter Tang’s tables with f, = 2, f = oo, ж = 0:05 and find that ф must 
have the value 2-35 for /(0-123 | 0-05, 2, оо) = 0-95. The number of repetitions required 


is obtained by setting 0-123n/3 = (2°35)? 


from whence т = 135. 
This is of course a very large number of repetitions and is required owing to the small 
differences among the estimates of the experiment. 
Tf we take the hypothetical case considered in (i) above and the same test requirements, 
0:743n/3 = (2-35) and n= 22. 
This value may again seem high but it must be emphasized that most experiments are 
conducted without regard to the powers of the tests employed and this is true even when the 
analysis of variance is used. 
VII. Discussion AND SUMMARY 
7-1. Additional test procedures 
Two test procedures relating to paired comparisons were discussed by Bradley & Terry 


(1952) and have not yet been considered in this paper. ил 
The first test is a combined test on the equality of treatment effects with the specification 
of the alternative hypothesis such that the parameters may differ from one group of repeti- 
tions to another. Suppose the paired comparisons experiment is performed in g groups of 
2 
repetitions with n, repetitions in the wth group, Xn, = n, and suppose that the treatments 
have parameters, Typ +++) Tw 2 Tiu = 1, in the uth group. The hypotheses are 
i 
Н: oma-dlt (i=l aat wed pg) 


+ Tang (1938) tabulated Pg, the probability of а type II error. In using the tables, 
polation was judged to be satisfactory for our examples. 


linear inter- 


468 Rank analysis of incomplete block designs. 111 


and H,: No 7;, assumed equal to any other 7;, (i+j orw+v; i,j = 1,...,t;u, v= L, ...,g). 
Let T, be the statistic corresponding to 7' for the wth group. Then 


g 
=>, 
u-l 


is the appropriate statistic for this test. From the additive property of x2, it follows that 
under H, T, has, for large n and fixed ratios л/т, the central y?-distribution with g(t—1) 
degrees of freedom and, under H,, T, has the non-central y?-distribution with 000—1) 
degrees of freedom and parameter of non-centrality approximated by 


The second test is a test of agreement of ranking from group to group. The hypotheses 
are Ho: т = 7; (i = 1,...,t; и = 1,...,g) and H, as specified in the preceding paragraph. 
Let T, be the T-statistic obtained by pooling all groups of repetitions as though we have a 
homogeneous set of n repetitions. The appropriate statistic for this second test is (T, — T,). 
This statistic has the central x?-distribution with (g — 1) ({— 1) degrees of freedom under A). 
Under H,, we state without proof that the distribution of (7, — Tp) is the non-central y?-dis- 
tribution with (g— 1) (t— 1) degrees of freedom and parameter of non-centrality approxi- 
mated by 


Ay = У Xin (5, — 2). 
iu-l 


7-2. Discussion 

The results obtained in this paper are valid only for large samples. For small samples 
they may be approximately correct, but the difficulty of obtaining exact powers makes 
a check very nearly impossible. Under H, some information is available on the approach 
of the distribution of T' to the central X?-distribution. The means and variances of 7 were 
computed from the exact small-sample distribution of T' (Bradley, 19545) and their approach 
to the corresponding moments of x? noted. It appeared that the approximate tests would 
be adequate at least for n > 10. 

In extreme cases, the estimators Pı» ++», Sometimes have the set of values 1,0, ..., 0. 

This prohibits the computation of А) and the estimation of variances and covariances of 
the estimators. This situation was previously discussed (Bradley, 19545), and it was noted 
that the first treatment could be eliminated from the analysis and that secondary para 
meter estimates could be obtained for лз, ..., zr. This procedure again seems suitable when 
the specified set of values occurs. Then it will be possible to proceed with the analysis and 
the estimation of variances and covariances for the remaining estimates. Other extreme 
sets of sums of ranks yield extreme sets of estimators and methods of dealing with them 
are discussed in the cited reference, 

The comparisons of the properties of the method of paired comparisons and of other test 
methods produce results that seem to be about what one would expect. The method of 
paired comparisons is clearly superior to the multi-binomial procedure both on the basis 
of the asymptotic powers of the tests for equal sample sizes and the study of their relative 
efficiency. It appears that the superiority of paired comparisons will become more marked 
as the number of treatments is increased. The comparison of the method of paired com- 
parisons with the analysis of variance is less favourable to the method of paired comparisons. 


RALPH ALLAN BRADLEY 469 


Here the comparison was made when the analysis of variance would be the appropriate 
method. It should be remembered that the method of paired comparisons was formulated 
largely for subjective tests and it is there that the assumptions of the analysis of variance 
are seriously suspect. Further, even if a scoring technique could be devised for which the 
analysis of variance could be used, it is quite possible that an over-all consideration of time 
of judging and analysis would dictate the use of more samples and the non-parametric 
method. The large-sample results indicate that for the usual numbers of treatments 24 times 
as many samples are required for the method of paired comparisons as for the analysis of 
variance. It is conceivable in a tasting experiment, for example, that an individual can 
indicate rankings in a paired-treatment experiment as contrasted to scoring individual 
samples on an 11-point scale, say, with much less taste fatigue and with a speed ratio in 
favour of ranking in excess of the ratio of sample numbers. 


7-3. Summary 


In this paper we have examined some of the properties of the method of paired com- 
parisons. The results obtained are asymptotically correct for large-sample sizes or for large 
numbers of repetitions of the paired-comparisons design. Formulae for the variances and 
covariances of estimates of treatment ratings, 7,,...,7;, have been obtained, and these 
were not heretofore available. It has been shown that statistics previously used for tests of 
significance have limiting non-central X*-distributions, and the appropriate parameters 
of non-centrality are given. 

It was found that in comparison with the analysis of variance the relative efficiency of 
the method of paired comparisons is ¢/{m(t—1)}, and, when ¢ = 2, this has the value 2/7 
previously obtained in comparing the sign test with the ‘Student’ t-test. The method of 
paired comparisons was seen to be considerably better than a multi-binomial procedure 
postulated and the asymptotic powers of these two tests are plotted for anumber of examples 
along with similar values for the analysis of variance. : T 

Examples of the use of the power function developed were given in application to an 
experiment in taste testing. Estimated variances and covariances of the estimators of the . 


example were computed. 
REFERENCES 


ABELSON, В. M. & Braprey, R. А. (1954). А 2х2 factorial with paired comparisons. Biometrics, 


10, 487. К ; s В r 4 
BnapLEY, R. A. (1953). Some statistical methods in taste testing and quality evaluation. Biometrics, 


9, 22. à А 

BRADLEY, В. A. (1954a). Incomplete block rank pretend on the appropriateness of the model for 
ired comparisons. Biometrics, 10, 375. я d 

Ba E (19545). Rank analysis of incomplete block designs. II. Additional tables for the 


ired comparisons. Biometrika, 41, 502. — ] 
з у= Terry, М. Е. (1952). Rank analysis of incomplete block designs. I. The method 


of paired comparisons. Biometrika, 39, 324. ЖОР МЕ GINS equations 


Cuanpa, К. C. (1954). A note on the consistency and maxima, 
Biometrika, 41, 56. te И 2% 
босу! Ww. G. (1937). The efficiencies of the ^ series test of significance of a mean and of a 
correlation coefficient. J. R. Statist. Soc. A, 00,69. — oM P 
CnAwÉn, Н. (1946). Mathematical Methods of Statistics. Princeton: Liner or cci 
Drxon, W. J. (1953). Power functions of the sign test and power efficiency 


Ann. Math. Statist. 24, 41 statistical sign test. J. Amer. Statist. Ass. 41, 557. 


Drxon, W. J. & Моор, A. М. (1946). г E ^ idi 
Fix, Evetyn (1949). Tables of noncentral xê. Univ. Calif. Publ. Statist. 1 (2), 


470 Rank analysis of incomplete block designs. 111 


HALD, A. (1952). Statistical Theory with Engineering Applications, $9-9. New York: John Wiley and 
Sons, Inc. 0 
HEMELRIJK, J. (1952). A theorem on the sign test when ties are present. Indag. Math. 14, 399, 
Horxis, J. W. (1954). Incomplete block rank analysis: some taste test results. Biometrics, 10, 39]. _ 
MOSTELLER, Е. (1951). Remarks оп the method of paired comparisons: I. The least squares solution —— 
assuming equal standard deviations and equal correlations. Psychometrika, 16, 3. 
NOETHER, G. (1955). On a theorem of Pitman. Ann. Math. Statist. 26, 64. 1 
Pearson, E. S. & HaRTLEY, Н. О. (1951). Charts for the power function for analysis of variance tests, 
revised from the non-central F-distribution. Biometrika, 38, 112. T 
PrrMAN, E. J. С. (1948). Lecture Notes on Non-Parametric Statistics. Columbia University, New York. 
Tana, P. C. (1938). The power function of analysis of variance tests with tables and illustrations of 
their use. Statist. Res. Mem. 2, 126. * 
Terry, M. E., BRADLEY, К. A. & Davis, L. L. (1952). New designs and techniques for organoleptie 
testing. Food Tech., Lond., 6, 250. 
WALD, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of 
observations is large. Trans. Amer. Math. Soc. 54, 426. ' 


Warsn, J. (1946). On the power function of the sign test for slippage of means. Ann. Math Statist. 
17, 358. 


i 
| 


[471] 


A METHOD OF ASSIGNING CONFIDENCE LIMITS TO LINEAR 
COMBINATIONS OF VARIANCES 


By A. HUITSON, Рн.р. 
+ Royal Naval Scientific Service 


1. INTRODUCTION AND SUMMARY 


Linear combinations of variances oceur frequently in analysis of variance work, when we 
are considering the estimation either of individual components of variance or of gross 
variability. Percentage points for such functions cannot be computed directly in the same 
manner as that adopted when the ү? distribution is used to give percentage points in pro- 
blems where only a single unknown variance is involved. 

Itis the purpose of the present paper (a) to derive a series approximation to the percentage 
points of functions of this type, (b) to investigate the numerical behaviour of the expansion 
in some particular cases and (c) to provide tables of the series expansion, suitable for use 
in certain practical problems. An example is given in which use is made of the tables, 
together with some general discussion of the single-factor type of experiment. 


2. DEVELOPMENT OF A SERIES APPROXIMATION 


Let us Mw any number of unknown population variances which we shall denote by 
01,01,..., 02:5 

Suppose that the observed data provide estimates s of these variances based on f; degrees 
of freedom respectively, and that these estimates are distributed independently of each 


other in the form з Wa 1fsl 3 
__1 (fi i i 1 
фай = түр) eo [52] ih 


We are Dis to assign confidence limits, calculable from the observations, to the 
expression E A,03, where À; ... A, are known arbitrary constants. In order to accomplish 


this, we shall ud a funetion y, which will satisfy the equation 


X Ast 
B| £.—— <y sh- P) | = P, (2) 
У мо? 


to denote ‘the probability of the relation in the bracket following’. It is 
function y having the required property, but as we shall 

find various functions which have approximately the 
to leave open the question as to whether our 


where Р, is used 
by no means obvious that there is a 
see later it is certainly possible to 
required properties. We shall be content 
problem may be solved exactly. 

* More strictly, in most practical applications, the тї, 0%, ..., от are parameters bret ge 
linear functions of more basic variances. It will be convenient, however, to refer to the 01, 08 Oe 


themselves as variances since they have estimates 82, 82, ...,8 Which are distributed in is (1). 
iom. 47 


31 


472 Confidence limits to linear combinations of variances x 
The large-sample normal approximation to y is 


menai (29) 


(5А; оў) АР 


where £ represents the Pth percentage point of the unit normal deviate. However, as the 
variances are unknown, it will be convenient to take as our initial approximation to y 


v2 | si , 
е а E) M 


АП summations are to be taken over the range 1 to r. 
We shall now introduce the notation 


SH aie s КЕТА 
ева, зе АДЫ 


and denote the corresponding functions of ће unknown variances by Amn and у; respec- 
tively. Thus our initial approximation to y becomes 
1+ (29). Ё (4) 
If we assume that y can be written in the form 
у= ++ ++... : (5) 


where /i, is of order fH, then the large-sample normal approximation implies that 


h= l, 
6 
hy Em P 


We shall now proceed to derive the next term, Л, in the series expansion of y. Let us 
consider the new variable 


u= Sy +. g 


As wis a function of the variance estimates, the moment-generating function of u may be 
found by averaging over the distributions of the s3. On simplification we find the first three 
cumulants of u to be 

Kı = -E /(2A,,) to order f-!, 
Ka = 2A, +4 2 (А„)® А, (А) 3} to order f-t, 
and К» = 8А, to order f-?. 

Thus we can see that K, = О(/-%), K, = O(f-!) and К, = O(f-?). !It appears that u has 

cumulants which decrease in powers of f such that К, = O( f+). 


Using the Cornish & Fisher expansion (1937) we then obtain after simplification the Pth 
percentage point of u, wp say, to be equal to 


224-552] - to order () (8) 


А. HvrTSON 473 


But this is a function of the ratios of the unknown variances and thus is not of immediate 
use to us. Consider any particular ratio, say 72,/72, where 72, and о? are any two of the 
сі... a}. The mean of s2,/s2 is equal to 


„(Л | Ah? 
(745) = sere). 

Also the variance of 57,/52 is of the order of f-!. Therefore neglecting terms of order f-t 
and lower we have in the probability sense’s?,/s? = с? /02. Thus, without affecting the 
accuracy of equation (8) we can replace all the ratios of the unknown variances by the 
ratios of their estimates, since all terms on the right-hand side of (8) are of the same order. 

Therefore Amn will now become Й„„ and 


7 
h= e| -siel-- (9) 


The fact that (9) is of order f71 and not of higher order is a verification that we have taken 
the correct value for h,. Thus the method may be regarded as self-checking, in that, in each 
application, it checks the value which has been obtained previously and adds another term 


to the series. Using the same method the next term of the series expansion for y, hs, has 
been found to be 


һ = (20) {EL — $¥oa( Var) + Vas (Var)? — Veo Var) — 59033)? (91) 7°] 
+ £3[ — 800) + 25 — 532) + 892) *]. (10) 
However, the solution was carried no further as the work involved would have been too 
laborious. 


3. CHECKS 


The following algebraic checks have been carried out on the series expansion of y derived 
above. 

(a) It has been shown to agree with an independent development of the series solution 
carried through by Welch (1956) for the same problem, although his treatment and the 
algebra involved are somewhat different in detail from mine. Ў 

(b) It has also been shown to agree with a result published by Bartlett (1953). In this 
paper Bartlett derives a confidence interval for a function of the form 0% +Ло$. To the 
appropriate order of terms, hg +% +h is the solution of this equation. 

(c) Asimple check, which has also been carried out, is to consider the special case when all 
the degrees of freedom except one are infinite. A series expansion for the function ЖА; о? сап 
then be found from the д? distribution and this compared term by term with the series 
derived above. 

4, NUMERICAL TESTING 
The series approximation given by Ao +, + ha + hig is applicable to the general problem 
involving r variances. In order to carry out a numerical investigation of it, we shall limit 
ourselves to the particular case r — 2 only and we shall assume both A's have the same sign. 


We can now write 


à,01 
m (Lon _ А _ "gi a Too RES 
E where с= gapi "UO У дон 


474 Confidence limits to linear combinations of variances 


For the cases in which both the A's are of the same sign, the permissible values of c lie 
between 0 and 1. 

Consider an approximation to the Pth percentage point of (Arsi + Ags3)/(Ayo} + Ago) 
depending only on the sample statistic с (it will also depend on the probability P), g(c) say. 
To test the accuracy of this approximation, we must evaluate 


pud о] =@ sy, qu) 


in order to see how close it is to P. Strictly, such a function as g(c) is not an approximation 

to the Pth percentage point of (4,51 + А„5®)/(А, 0 4- A503), as this is a function of the un- 

known variances while g(c) is independent of these variances. It can only be regarded as 

such an approximation in the sense that it makes the value of Q in equation (11) close to P. 
It can be shown by straightforward transformations that 


күн нш eed UE) a sb ___ 
SA Ж ГА ) OLD (12) 
т 1 a 
cae dal Jo түру dose aan, 
1 
2-Е at Ba 
p(b) db BUD 1(1— b)Èfs-1 db 
and -hin 


TEST 


From equation (12) the following values of Q were obtained by quadrature when 
fı = fa = 10 with g(c) replaced by the series approximation, cut off at various points, in order 
to see what effect each successive term of the series had on the result: 


Values of Q when f, = f, = 10 


P=0-95 
| y=02 
| 

g(c) = ho +h, 0:9419 

(c) = ho +hy+he | 0:9533 

g(c) = hoth +hg+hg | 0:9501 
| 


The values of Q for у = a, say, and y = 1—q will be identical when f, = fẹ. Therefore the 
values of Q when у = 0-8 will correspond to those given in the third and last columns of 
the above table. In all except the third column it is seen that the addition of each successive 
term to the series approximation brings the value of Q closer to the chosen value of P. In 
the third column the addition of h, has moved Q in the right direction, but too far. 

It may also be noted that the values of Q obtained when P = 0-05 are much more in error 
than the corresponding ones for P = 0:95, This suggests that the series approximation may 
not be so good at the lower end of the distribution as it is at the upper end. 


A. Ноттѕох 475 


This comparison relates only to the case r = 2, and A, and A, positive. More work would 
be necessary to assess the merits of the series for the more general situation. The results for 
the case considered are, however, sufficiently promising to make it appear worth while to 
table some results for the larger numbers of degrees of freedom at least. Such tables are 
presented in the next section. 


5. TABLES 


A two-decimal table will usually be quite adequate for a working statistician. Four pro- 
bability levels are tabled in this paper (Tables 1—4). 

We shall regard the series solution as giving us two-decimal accuracy when Л, < 0-005, 
The argument here is that if Л; is negligible, then so, almost certainly, are higher terms of 
the series. If ^, is not negligible, it does not follow that the higher terms are not negligible, 
but we have no absolute justification for assuming that they are. Using this criterion, the 
tabled values will be correct, except possibly those for which one of the degrees of freedom 
is 16, which may be one or two units in error in the second decimal place. 

The values of degrees of freedom for which the tables have been constructed are such that 
their square roots form a harmonic set. The purpose of this is to make interpolation with 
respect to the degrees of freedom easier. Thus, for intermediate values of /, it is necessary 
to interpolate with respect to 12//f. Tables 1—4 can be used to assign 90 and 98 % confidence 
intervals to any sum of two variances, provided the degrees of freedom of the variance 
estimates are large anough. 


6. EXAMPLE 


The following example is given by Tippett (1952, p. 111). It is of the single-factor analysis 
type. Mule cops (bobbins) of cotton yarn were collected in blocks of 20, each block being 
from a different mule. Two leas (1 lea = 120 yards) were weighed from each cop, giving for 
each block 39 total degrees of freedom, 19 ‘between cops’ and 20 ‘within cops’, together with 
corresponding sums of squares. There were six such blocks and the six sets of sums of squares 
and degrees of freedom were added to give the following table: 


Source of Sum of 
variation squares 
Between cops 19,138-85 167-88=s} 
(within blocks) М 
Within cops ` 5,681-00 47:34= 83 
Total 24,819-85 


i i i iance in the 
where g2 is the variance between cops and c is the interaction (or error) vari 


usual manner. 
i jati ithi these are 
i anal of the variations within the blocks, and 
Кайыр, с о Бош» not easily controllable. The block to block 


interesting because they are due ee j 
TEE е not studied because they can be easily eliminated by careful adjustment of 


the mules, 


476 Confidence limits to linear combinations of variances 


Table 1. Lower 5% critical values of (4,53 + Ags3)/(Ay oF + A203) 


Asi 
Asi Aas 


052 055 0:59 0-62 064 062 059 0 
0.07 069 071 0:70 0-66 062 058 055: 
0-82 081 0:76 0-71 066 062 058 0 
091 083 077 0:71 066 062 058 0 


0:52 055 058 062 066 0-70 0-71 069 
067 069 0-72 074 0-75 074 0-72 069 
0-83 083 0:83 081 078 075 0-72 0-69 
095 090 086 082 0-78 0-75 0-72 0-70 


0:52 055 058 062 066 0-71 0-76 0-81 
067 069 072 0-75 0-78 081 0-83 0-83 
0-83 084 086 086 087 086 0-86 0:84 
0-98 0-96 094 0-92 0:90 0:88 0-86 0:85 


0-52 055 058 0-62 066 0-71 0-77 0-83 
0-07 070 0-72 0-75 0-78 082 0-86 0:90 
0-83 0-85 086 0-88 0:90 0:92 0:94 0-96 
1:00 100 100 100 1:00 100 100 100 


Table 2. Upper 5% critical values of (А+ А88) (A, 03 + 4503) 


150 146 144 143 144 146 150 
1:35 134 134 136 1:39 143 148 
120 122 1:95 1:29 133 139 1:46 
1:09 113 118 124 130 1:37 145 


148 143 139 136 134 134 135 
L33 131 1:29 129 129 131 133 
118 1:18 120 12 124 127 131 
106 110 113 117 121 1260 131 


146 1:39 133 129 125 122 1-20 
131 127 124 121 120 118 118 
116 115 114 114 114 115 116 
1:08 105 107 1:09 111 113 115 


145 1:37 130 1:24 118 113 109 
131 126 121 117 113 110 1:06 
115 113 111 1:09 107 1:05 1-03 
100 400 1:00 100 100 1:00 1:00 


A. Ноттѕох - 477 


Table 3. Lower 1%, critical values of (A, #}+A,89)/(A, 01 А0) 


ма _ 


Asi + Aas 


fi 4 | 
16 | 16 


144 


Table 4. Upper 1% critical values of (A488 + Ags3)/(Aro +4279) 


As 

Ad FAS 00 01 02 03 04 05 0-6 07 08 09 10 
Л № 

16 16 2-00 1:85 176 170 167 165 167 1-70 176 185 2-00 


36 163 1:55 152 150 151 154 1:58 163 1-71 183 2-00 
144 129 128 129 132 136 141 147 1:56 1-68 182 2-00 
1:00 106 111 118 125 133 1-42 1:53 166 182 2:00 


со 
36 16 | 200 183 171 163 158 154 151 150 152 1:55 1-63 
36 | 163 154 149 145 149 142 143 145 149 1:54 1:63 


144 129 127 126 127 128 131 135 140 146 1:54 1:63 
oo 1.00 1:04 108 113 118 124 130 137 14 153 1:63 


147 141 136 132 129 128 129 


144 16 | 200 182 1-68 1:56 
36 | 1:63 1:54 146 140 135 191 128 127 126 127 129 
144 | 129 126 124 199 121 120 121 122 124 126 1-29 
oo 100 102 105 107 110 113 116 119 122 126 129 
© 16 | 200 182 166 153 142 133 125 118 111 106 100 
145 137 190 124 118 113 108 104 1:00 


36 1:63 153 
144 129 126 122 1-19 
о 1-00 100 100 1:00 100 


116 113 110 107 1:05 102 1:00 
100 100 1-00 1-00 1:00 1-00 


478 Confidence limits to linear combinations of variances 

‘Between cops’ corresponds to the factor and ‘within cops’ to the error; but it is in- 
appropriate to speak of error when the investigation is not a controlled, or partially con- 
trolled, experiment. 

Suppose that we wish to find confidence limits for the gross variability (i.e. the variance 
of a single observation). The estimate of the gross variability, o} + o? = 403+ 403, is 


Ay 8? + Ags} = (47-34) + (167-88) = 107-61. 


Then o = A, 83/(A, 52 +A,83) = 02200, f, = 120, f,— 114. 
Interpolating in Table 1 we get a value of 0-83, and from Table 2 we get 1-18. Hence 
7 2 
в oss «А1128 < 1-18) го-о, 
0+0} 

107-61 107-61 

шг ا‎ 2 20: 
ог а 1:18 <ой+оў< 0-83 | 0-90, 
ог P.(91 < of + оў < 130}=0-90. 


Thus 90 % confidence limits for /(o%+03) are approximately 9-5 and 11:4. 
In a similar manner 98 % confidence limits can be found by using Tables 3 and 4. 


7. Discussion 


In order to use the tables given in this paper in experiments of the type described above, itis 
necessary that both rows of the analysis of variance table should have degrees of freedom as 
large as 16. This can be ensured by making the number of levels of the factor large enough. 
If the number of levels is small we shall not get a good estimate of the variability between 
levels, and hence we shall be led to wide confidence limits for the gross variability. When the 
number of levels is very small, they will be so wide that it will not be worth spending time 
estimating them. Thus it is desirable that the number of levels should be as large as possible. 

The ideal experiment for this type of analysis will have two replications for each level 
and a large number of levels, so that both lines of the analysis of variance table are of com- 
parable accuracy. However, if the cost of increasing the number of replications is negligible, 
there is no harm done і more than two are made. It will be advantageous to make the number 
of replications reasonably large, if an accurate estimate of the interaction variance is also 
required from the analysis. 

Although the number of levels, K, should be large, experiments are often made in which 
K is small. For example, in chemical and engineering tests, K has an average of about 10, 
paca rising above 20. There are two reasons for this use of a small number of levels of the 
‘actor: 

(i) The cost of increasing K is usually greater than the cost of increasing the number of 
replications, An example of this is found in agricultural field trials where the levels of a 
factor correspond to farms. Here the cost of transporting personnel and equipment to an 
extra farm will be greater than the cost of taking an extra observation on the farms which 
have already been chosen for the experiment. Ñ 

(ii) There may be more trouble involved in increasing K. For example, a manufacturer 
may send samples of his product to each of K laboratories, n tests being made on the sample 


in each laboratory. It will be much easier to carry out a few extra tests than to get a larger 
number of laboratories to co-operate. 


A. HvrrSON 479 


However, there is a way in which large values of the degrees of freedom can arise. The 
experiment can be repeated several times and the results of the separate experiments 
combined. This, in fact, was done in the example given above. I think that this technique 
could profitably be applied to other experiments in which high accuracy is required. 


8. CONCLUSION 


A series expansion suitable for estimating confidence limits for a general linear com- 
bination of variances has been derived as far as the term of order f-}. In numerical work 
for the case of a positive combination of two variances, it has shown itself to be remarkably 
good, the best results being obtained at the upper end of the distribution. In view of this 
fact, tables of the 1, 5, 95 and 99 % points have been calculated and are given in this paper. 
They are suitable for estimating confidence intervals for the sum of two variances, if the 
degrees of freedom are large enough. An example of their use is given in the text. 


REFERENCES 


BanrLETT, M. S. (1953). Approximate confidence intervals. IL. More than one unknown parameter. 
Biometrika, 40, 306-17. 

Совміѕн, Е. A. & FISHER, R. A. (1937). Moments and cumulants in the specification of distributions. 
Extrait de la Revue de UInstitute International de Statistique, 4, 1-14. Reprinted as paper 30 in 
Contributions to Mathematical Statistics, by R. A. Fisher (1950). John Wiley and Sons, Inc. 

Тірретт, L. Н.С. (1952). Technological Applications in Statistics. Williams and Norgate Ltd. 

WELCH, В. L. (1956). On linear combinations of several variances. J. Amer. Statist. Ass. 51, Part 1. 


[ 480 ] 


INTERPOLATIONS AND APPROXIMATIONS RELATED 
TO THE NORMAL RANGE 


By JOHN W. TUKEY* 
Princeton University 


Many quantities related to the normal range are only well known for a rather open grid of 
values, and interpolation is often considered difficult (vide David, Hartley & Pearson’s 
(1954) remark at p. 486). Methods of convenient interpolation should thus be of use. 

Elfving (1947) established a general asymptotic result for ranges of samples from sym- 
metrical distributions. For the normal distribution this implies that 


n | "ede 
iw 
has a limiting distribution. The usual limiting value of Mill's ratio yields 


É e? da wu echt, 


u 


When we combine these and take logarithms, we learn that 
w+ Sinw— 8ln» 


has a limiting distribution. Wemight then expect, when the middle term could be neglected, 
that w? would behave like 8Inn = 18-42log,, ^, while the reciprocal of the variance of w 
would be proportional to 


d(w? + 81n w)\? 4\2 
(em = 4(1 +5) ^ 18:42log;on +8. 


Thus the use of н fas c(t) = logy b(n +c) 


would seem to offer promise as a basis for interpolation. 
With suitable choices of b and c, the ratios of suitable powers of: 
(1) ws», and wiy, the upper 5 and 1 % points of the normal range, 
(2) d, and V,, the average value and variance of the normal range, 
(3) the ratio а, / /V,, and 
(4) (w/s)g,, and (w/s), the upper 5 and 1% points of the ratio of a range to $ from the 
same sample, 
to log;ob(n-- c) are remarkably constant. This is shown in Table 1, where the values ofn» 20 
are those most likely to be key values in view of the existence of standard deviations for | 
the normal range for these values. The ratios have been carried to enough decimal places | 
ü ie either trends in the ratio, or irregularities which reflect limitations in the available _ 
es. 
The constancy of the ratios is so great as to provide not only interpolations but simple and | 
rather accurate approximations. With the growing use of internally programmed automatic 


* Prepared in conriexion with research sponsored by the U.S. Office of Naval Research. 


Јонх №. TUKEY 481 


computers, the already substantial convenience of such simple analytical approximations 
is being substantially enhanced. 

In view of the surprisingly good approximations obtained by choosing b and c properly, 
it is important to emphasize that useful interpolation is also quite possible and even simpler, 
since division by the square root of log, n will make almost all of the quantities treated easily 
interpolable for n between 20 and 1000. 

The 1% points of the range for n < 20 were taken from Table 22 of the new Biometrika 
tables (based in part on Pearson (1942)), while the corresponding 5% points were inter- 
polated in Table 23 to give a third decimal, values for 30 < n < 100 were taken from Pearson 
(1932), while values for n= 200, 500 and 1000 were computed from the four moments found 
by Tippett (1925), using Table 42 of the Biometrika tables (based in part on Pearson & 
Merrington (1951)). The values so found were: 


(the values forn = 100 being included for comparison with Pearson’s values of 6:08 and 6:63). 
The approximations (corresponding to ‘std’ in Table 1) 

шу, = {17-0 logy (1-5n)}#, 

шуу, = {17:510210 (3-3n)}#, 
ulated values for 20 << 1000 and to be satisfactory 


seem to be about as accurate as the tab ^ best 
ient to write the approximations in the form 


for п > 6. For some purposes, it is more conve 
(corresponding to ‘alt’ in Table 1) 
(w»)ss[(wa)sx = 149010810 (15m), 
(ws asl tois = 1-51(log;o (3-37). 
The average values of the normal range, dp, were taken from Table 27 of the Biometrika 
tables (based on Tippett (1925)). The approximation (‘std’ in Table 1) 
d, = {17-06 logy 0-291(n + 2.6) 


і ile i ion i 1 for a more 
i 1 part in 1500 for 20 <n < 1000, while interpolation in Table i 
i eo E f 1 in 20,000. The alternate approxima- 


precise ratio than 17-06 will easily give an accuracy o i ) terr à 
tion, which in this case agrees with the approximation just given to 4 significant figures, is 


d, jd, = 3:66(1og,o 0-291(n + 2.0). 
The variances of the normal range, Vp, were taken from Table 20 of the Biometrika tables 


ippett (1925), as repeated on p. 45 
based on Hartley & Pearson (1951)) for n < 20, from Tippe 
Mec peer tables, for n = 60, 100, 200, 500 and 1000, and from Pearson (1932) for 


n = 30, 45 and 75. The approximation 
VV, = 1-338 logio 1-05(n + 4-4) 


482 Interpolations and approximations related to the normal range 


is apparently about as good as the tabulated values for 20 < n < 1000 and is quite useful for 
n> 6. The alternate form differs by about 2 parts in 1300 and is 


V,[V, = 1:03/log;o 1-05(n + 4-4). 


Table 1. Ratios of various quantities to logy, (b(n +с)) for selected values of n 


ШШ " 
[ws Plu [wı x]? | d; d,/ JV, dal AV, l/V,, [(w/5)s «P | {(w/s), „P 
M I— 
1-5 3-3 0-291 | 0-52 0-494 1:05 078 | 11 
0-0 0-0 2-6 2:5 1:535%п 44 oad. | ae 
2 16-10 16:17 10-0 3-58 6-84 1:66 Imag. Imag. 
3 16-82 17-05 13:5 418 4-900 1-423 Imag. Imag. 
4 16-9 17-28 15-0 4-42 4:8340 1:366 Imag. | Imag. 
5 17-000* | 17:38 15-69 4-56 48347 1:3440 Imag. | Imag. 
6 17:035* | 17:47 16:12 4-630 4:8349 1:3343 133- Imag. 
8 17-037* | 17-52 16-57 4711 4:8351 1:3348 39- 86: 
10 17-026* | 17:53 16-78 4:7497 4-8351 1:3340 21- 32: 
15 17:041* | 17-53 16-99 47871 4-8353 1۰3359 19-04 21-8 
20 17-006* | 17-54 17-054 4:7984 4:8352 1:3370 18-62 20-10 
30 17-06 17-50 17-082 4:8034 4:8345 1:3378 (18-41) (19.78) 
45 1696 | 17-47 17-079 48029 4-8341 1:3382 nja nja 
60 16-98 17-50 17-070 48015 4:8343 1:3382 18-49 19-95 
75 17-03 17:49 17:062 48001 4-8336 1:3377 nja nja 
100 16-991 17-45} | 17-052 4-8008 4:8355 1۰3393 18-59 20-14 
200 17:00 17:43 17:036 4:7979 4:8345 1:3388 18-64 20-17 
500 17:04 17-48 17-044 47951 4:8317 1:3370 18:62 20:14 
1000 17:10 17:49 17:056 4-8008 4-8358 1:3396 18-59 20-03 
17:0 17:5 17:06 4:8 4-835 1:3380 18-6 20-1 | 
17:5 17.52 17-06 — = 1:336 18:49 20-25 


* Based on interpolation in the cumulative distribution to a 3rd decimal. 
T 17:04 by method used for n> 100. 

i 17-40 by method used for n> 100. 

Imag.=imaginary, n/a=no percentage point tabulated. 

Values in ( ) based on interpolations by Pearson (1932). 


Two sorts of approximations to the reciprocal of the coefficient of variation of the range 
are given. The first yields 


d, | V, = 4:8 logy, 0-52(n + 2-5), 


and is accurate to the accuracy of the known values or to 1 part in 1000 for n> 20. The v 
second, which illustrates an altered form for c, yields 


d,] V, = 4-835 logy, 0-494(n + 1535 Wn), 


which is accurate to the accuracy of the known values or to 1 part in 4000 for nz 4. (No + 
alternatives are provided.) | 


| 


Joun W. TUKEY 483 


The percentage points of the ratio of a range to the standard-deviation estimate, s, from 
the same sample were taken from the recent paper by David et al. (1954). The approxi- 
tions 
- ES (ав), , = {18-6 logy, 0-78(n — 4-5)}%, 


(w/s),« = (20-11og;, 1-1(n — 7-4), 


are not quite as good as the key values, the ratios in Table 1 showing a tendency to run low 
near n = 50, and, probably, high near n = 300. The amount of this effect is near the limit 
of accuracy, and should not amount to more than about 1 part in 400 for 20 €» < 1000, 
About as much more error is introduced by two-significant-figure constants in the alter- 
nate forms 

(w[5)gs, = 4-3(log; 0 78(n — 4-5)}#, 


(шв), = 4-5flogy 1-1(n— 7:4). 


METHODS 


The approximations given are not best in any specific sense, but have been chosen to fit 
reasonably well. Except for ws, and w, where с = 0 works quite well, the values of b and 
c were usually determined to make the ratios for n = 20,100 and 1000 nearly the same, 
which can be easily done by trial and error solution of 


log; (1000 + c) — logy, (100 +c) 
(Чуооо/ 100) — 1 


= 1ор(20+с)— 


log;g (100 4-c) — 


logy (1000 + c) — log; (20 +c) 
(% 099/20) — 1 


ЕД 


the common value of these differences being taken as – 10200. 


ILLUSTRATION OF APPROXIMATION 


To illustrate the use of the technique in more refined approximation, Fig. 1 shows values of 
antilog;o (d,,/4:835 JV, ) = 0:496n.— 0-6 logign 


plotted against log; n. For larger values of n the dashed vertical lines indicate the uncer- 
tainties associated with the fact that V, is given to only 3 decimals for n = 60, 100, 200, 500 
and 1000, and to only 4 decimals for 30, 45 and 75. Interpolation to nearly 4-decimal accuracy 
in V, is clearly possible from » = 20 nearly to n = 100. Better values for n's between 100 
and 1000 would clearly be obtained from one or two 4 or 5 decimal values of V,, rather than 
by many more values of lower accuracy. A single highly accurate quadrature for an n of, 
say, 1001 would probably go far to settle this situation. 
The quantity plotted can be regarded as 


0:496c' — 0-6 logy”, 
where d, | AV, = 4:885 10010 0-496(n +c’). 


Here 4-835 was chosen from the second approximation to d, [A V,, while 0-496 was modified 
from 0-494 to give a relatively straight plot for п between 10 and 100. The use of 0-6 logo 


is not essential, and serves merely to simplify the plot somewhat. 


484 Interpolations and approximations related to the normal range 


A further use of the same technique is in supplementing the available values of the lower 
percentage points of w/s. The lower 2:5 % point serves as a good example. Fitting ton = 10, 


100 and 1000 yields 3-907 + {logo [(n + 11-42)/7-79] 


as a first approximation. Keeping the 3-907, and adjusting the 7-79 to 8-14 to produce 
reasonable linearity against log т, leads to the use of 


z = 8-14antilog;, (y/3-907)? —n—4-6 logion, 


as а quantity of reasonable constancy, where у is the lower 2-5% point of w/s. 


1:44 352 


13 


12 


E 
= 


= 
© 


о 
v 


Residual —0:496c' — 0:6 logion 


о 
© 


07 


‚06 


2 5 10 20 50 100 200 500 1000 
Sample size, п. { | 
—1:35 -271 


Fig. 1. Residuals for d,,/,/V,, where c' is defined by d,/V, = 4-835 logy, 0:496 (n +0). 


The numerical values found are as follows: 


n 2 3 10 15 20 60 100 200 500 1000 
2 7:63 7-69 7-79 8-00 8-70 797 728 787 7:89 43:42 


The values forn = 2 апа З are taken from Thomson (1955). There appears to be some ground 
for suspicion that the values of y for 20 and 1000 are 0-02 or 0-03 high. Interpolation for 
n = 4 to 9 should be simple and effective. 


While the original approximation (to 0g,) was obtained entirely empirically, the writer 
owes the suggestion of its asymptotic explanation to his former colleague, David L. Wallace. 


Јонх W. TUKEY 485 — 


REFERENCES 


Davin, Н. A., HAnTLEY, Н. О, & Pearson, E. 8. (1954). Biometrika, 41, 482-93. 
ELFVING, G. (1947). Biometrika, 34, 111-19. 

HARTLEY, Н. О. & Pearson, E. S. (1951). Biometrika, 38, 463. 

Pearson, E. 8. (1932). Biometrika, 24, 404-17, 

Pearson, E. S. (1942). Biometrika, 32, 301-8. 

Pearson, E. S. & MERRINGTON, M. (1951). Biometrika, 38, 4-10. 

Тномзох, С. W. (1955). Biometrika, 42, 268-9. 

ТІРРЕТТ, L. Н. C. (1925). Biometrika, 17, 364-87. 


[ 486 ] 


THE GAMBLER'$ RUIN PROBLEM WITH CORRELATION 


By C. MOHAN 
London School of Economics and Columbia University, New York 


1. INTRODUCTION 


In the classical problem of the gambler’s ruin there is a constant probability of the gambler 

winning any game and the games are independent. Feller, for example, has treated this 

problem in his book (1950). We consider here a more general problem where there is a correla- 

tion between the results of two successive games. We consider two players X, Y. Conditional 
on winning the previous game, the probabilities that X wins or loses the next game are py, у, 

and conditional on losing the previous game these probabilities are qa, p». Thus X’s ability 
to win or lose is governed by the transition probability matrix (t.p.m.): 


win loss 
win [os Ф ) (1-1) 
loss \ 9з Pe 

where Pith = 1 = patde (1:2) 


If we suppose that the stake in each game is one coin, two successive games can have 


for X one of the following pairs of results, the transition probabilities being given before 
the colon: 


p: l-1, q: 1>—1, p: 15-1, q; —1-1. 
On the assumption that a score of +1 has the same probability as a score of — 1 in the first 
game, the coefficient of correlation is 


Ip 21—98 : 1:3 
N(P1 +42) (Pa - 1) E 


When р; = p, = p (so that also q; = d; = 1— фр = q), the correlation becomes R = р— 0: 
"This corresponds to a real situation, as, for example, a game between two players, of whom 
the one holding the ‘bank’ has an advantage, the bank passing to the winner of each game. 
We suppose that the probability of winning with the bank is p, and p >q, so that R> 0. 
On the other hand, if the rule were that the bank passes to the loser of each game, we should 
have a negative correlation R. з 

Suppose that X starts with a capital and Y with a capital «—r, and the games are played 
with a certain (positive or negative) correlation R, until one or other of the players is ruined. 
How is the probability of ruin affected by R? In particular, if X had the choice of deter- 
mining whether the games were to be played with a positive or negative correlation, ОП 
what basis should he decide so as to maximize his chances of winning? 

This problem is analogous to a correlated random walk in the presence of absorbing 
barriers at n = 0 and at n = о, the particle starting from a position r(0 <r < а) and moving 
a unit step to the right or to the left according to the t.p.m. (1:1). 

The random walk is said to be symmetric if p, = p, = p. Then at the end of each step 
there is a probability p of a step being in the same direction as the preceding one and a 


C. MOHAN 487 


probability q (= 1— p) of the direction being reversed. We shall consider first the problem 
for the general unsymmetric case corresponding to the t.p.m. (1-1). 

In case the correlation is zero (i.e. p, = 1, = Ӯ, 80 that q, = р, = 1— p), the steps are to 
the right with a constant probability p and to the left with a constant probability 1— p; 
thus there is a drift towards the right which is measured by 2p — 1. When ӯ = } the random 
walk is both symmetric and uncorrelated. * 

In the case of unsymmetrical correlated waik, the drift may still be measured by the 
quantity 


j А ô = р-р». 
Introducing the quantity 
с=рү+р3-—1, (1-4) 
the coefficient of correlation R given by (1-3) is now given in terms of c and д by the relation 
RX1—84) = oF. (1-5) 


It will be noticed that in the symmetrical case (д= 0) c = R. 


2. PROBABILITY OF ULTIMATE RUIN 


Denote by a, and b, the probabilities of ultimate ruin of X (whose initial capital is r) con- 
ditional on the first game being a win or a loss. It is easy to see that a, and b, satisfy the 


recurrence relations 
а, = Py Op +9101 


and һы = Раб, +924, 
Denoting by Е the operation of increasing the suffix r by unity, these may be written as 
(1 -p,E)a, = ФЕ, 
(E — py) b, = Фа, 
which show that a, and 5, satisfy the difference equation 
(E —p3) (1 PB) X» = tds EX» 


| (0<r<a—1). (2-1) 


: l+c 2a) 

a | Pi р. 
т 

Hence Dr +H) , 


= [en " -n [e n | 


The boundary values are b, = 1 and a, 1 = 0; whence 
в+щ)-1, авио 


so that piqG = – 41019 = pol pF g- 417). 
Substituting these values of @ and Н we are led to the probabilities of ultimate ruin 
2 q,4* ФА" 

ТЫ) (2-2) 
XE q^ f Ф.А 
апа "ФАА ' 
where А = pa[pi- 


32 


b, 


Biom. 42 


488 The gambler’s ruin problem with correlation 
Now if c, denotes the probability of initial win and с, (= 1—c,) of initial loss, then | 
general expression for P, the probability of ultimate ruin unconditional on the result: 


сүйрү ree p - 50973 2) req gà) 
(q,4*7 — 4) 
For А = 1, (2:3) is indeterminate. In this symmetrical case the equations (2-1) reduce to. 
а, = Pûr + 4б 
and bí = pb, + qay. 
Hence, it is found that Р, is now given by 
р = %+(0—1—')4 ( 


8 1+q(a—2) 
Р, can be expressed in terms of à and c, since 

=(c+1—4)/(c+1+6) and qaq, = (1—c4-9)/(1—c— д). 
As the formula in its general form is not easy to interpret, we take for illustration the 
S Saag a Р-р with а-а es 
and study the behaviour of Р, as the correlation R varies from —1 to +1. Now 

R=p-q= 3:794 ae. 

Let p = 2R|(1 — R), so that q = 4(1— А) = 1/(p +2). Then from (2-4) 


Now p varies between — 1 and +o as R varies between —1to +1, and ж>т> 0. Hence | 
P, lies between 4 and 1— (r— 3)/(« — 1). 
Thus P= } according as r $$. Pei benotined that lle Бу: =}, Him m^ 


in the three cases r$ 4a. The situation is shown below in p UM of | anton (2:6) for 

three typical cases. Each curve cuts off from the vertical axis an intercept (1—7/a); a8 4 
R++1, P+}; when R>-1, Р,->1— (т— 3)/(x— 1). These results are easy to interpret, - 
for when R> 1 the probability of ultimate ruin depends less and less on the initial capital _ 
and should more and more approach the probability of losing the very first game; this is 
shown by the three curves approaching one another as R->1. The value of P, for R = =1 is 
actually non-existent, as is borne out by a study of the equations (2-1) which for R = =1 
degenerate into a single set of equations 


a, = by (1<7<0— 2), 


with the boundary values a, , = 0, b, = 1. These (x — 2) equations are not soluble; in fac 
these show that a;,d,...@,_, are respectively equal to by, bg, ...,5, , and none of them 
volves either of the unknown probabilities a,_,, b,. On the other hand, when R = 1, equal 


2-1) become 
2) а, = dıı b,—b,, (1<r<a—2), 


к= 


C. MOHAN 489 
with the boundary values b, = 1, a, , = 0. Hence 
@,=@,=...=a,,=0 and b «b=... 5, = 1, 
so that Р, = 0,4,+¢,b, = с. (=} for the graph). 
We conclude that if X's capital is initially less than Y's, it is advantageous for him to 
choose the positively correlated play, and if initially greater than Y's, then to choose 


negatively correlated play. In each case the greater the correlation in magnitude the more 
it is to be preferred. 


-1 0 1 
Rẹ 
Fig. 1. The graphs show the probability of ruin, Р„ as a function of the correlation R in a game in 
which the total capital is 12 for three cases of initial capitals of r=3, 6 and 9. 


3. EXPECTED DURATION OF PLAY 


The expected value of the duration of play can also be determined by the use of difference 
equations. The method is justified if we assume that these expectations are finite. Let 
A,, B, be the respective durations of play if the initial capital is r and the first game results 
in a win or loss. 

Then if the first game results in a win the capital becomes r+ 1 and then the expected 
duration ів A,,, with probability p, and B,,, with probability q,. Hence 


TTF ae (0«r«a—1). (3:1) 
similarly В, = paB, q4 4, +1 
The boundary values are 
B,=1=4,+ (3-2) 

Equations (3:1) become homogeneous on putting 

A, = Kr- Ld, B,=Mr+N+f, (3-3) 
if Kr+L = p,(Kr+K+L)+q(Mr+M+N), 

Mr+M4N = p(Mr+N)+q(Kr+L)+1; 

i.e. if K =p, K+qM, L=p(K+L)+q(M+N)+1, 


M=pM+qK, M+N=p,N+QL+1. 


32-2 


490 The gambler' s ruin problem with correlation 

Remembering the relation (1-2), viz. р, +49, = 1 = Pz + qs these equations are solved by 
К= М, q(L-N)-M-1 and 91-0) = M-1, 

ie. K-M-(dq*4)(q-4) and L-N = 2/(g- 3). (3-4) 


This solution shows that the homogeneity of (3-1) achieved by the substitution (3-3) is not 
disturbed if the same constant is added to both the relations (3-3). This conclusion is сог- 
roborated by a study of the structure of equations (3-1); for if A, and B, are increased by k 
(a constant with regard to r) then by virtue of the relation (1-2) k cancels out from them. 
Consequently one of the constants L and N сап be chosen arbitrarily. We therefore make 


the substitution A = Кена: B= Mr, 
i.e. set L — 0 in (3:3). This substitution transforms (3-1) into 


d, = Pidna +f fra = рә), dad, 
Proceeding as in $2 the solution of these equations is found to be 


d, = ABS). 


меен] -ofera 


' so that A, = r+ a+ (y, G 
9174a 
and Buy, = LB] -g 4r ae. 
nig ЖЕТА, ) 41742 Py Ф 
The boundary values (3-2) give 
a-l - 
а ра[о) Ages ++. 4. ва 
d1— da h- ds Ф 
—2) (+4) +2 —2q2 (mq 
whence Bae UE t» | а= e (8), 
(92/9 — 4*7) (41 — qa) (4-4) \ 
Thus A (i+) + 2р, | (a—2) (9+9) +2 А-9 
91—948 (q1— 43) 91—487 (3:5) 
and Ba (2+9) 24; | (0—2) (9+9) +24 А-1 i 
4—4» (1—9) mala - 273 


We now consider the symmetrical case qı = Чә. The relations (3:4) are indeterminate in 
this сазе. We therefore try the quadratic expressions 


А, = K,+Lr+Mr, В, = Ka+ Lar + М. 
Substituting these expressions in 
A, = DÀ, +B + 1, В, = р „+4А,+1, (3-6) 
and identifying the various powers of r, 
K,- P(K, +L, + М) +q(Ka + La + M,) +1, 
L, = p(L, + 2Mj) + q(L, + 2M;), M, = pM, +qM,, 
К,+1%+ My = qK + pKa +1, La+ 2M, = qL,+ pL, М, = qM, + pM. 


С. MOHAN 491 
The third and sixth give M, = M, and the other four are 
q(K,— Ку) - pl4-- qL,- M, - 1 = 0, (3-7) 
q(I4 — L4) - 2M, = 0, (3:8) 
K,—K,-L,— M, 1 = 0, . 
and one identical with (3-8). жый: ы me 
Adding (3-7) and (3-9) 
p(L, — L4) 2 = 0, (3-10) 
so that by (3-8) M, = 4-14) = —q/p, 


and finally by (3:9) 
q(K,—K,) = L,-M,-1- L,-l|p by (3-10). 


Thus A, = К+ (К, - Kj) - 1p] r, 
(3:11) 
| B, = Ky erigi - Kp) - 7n. 
The initial conditions (3:2) give 
K, (a — 1) (q(K, — Ka) — 1/p) — (9р) (6-1) = 1 = K,--q(K,— Ka) +1, 
which are solved by 
K,=a and К, = – 40р. 

Thus (3-11) become s j ; j 

A, = a t (rp) (aq—1)—r°q/p, | (3-12) 
апа В, = —(a+1*)q/p + («q +1)r|p. 


Hence the expected duration of play unconditional upon the result of the first game is 
D, - c[- ne Ir 1-0] ној L7 0-7]. (3-13) 


where су, cs are, as already defined, the initial probabilities of win and loss. Assuming the 
first win and loss to be equally probable (3-13) becomes 


D, = ңа) ража). ` (3:14) 


When R = 0, D, = т(®—т), in agreement with Feller’s formula (1950, р. 287). Further, 
since r(x —r) > ta? > ja (as a> 2), it follows that except in the trivial case ja = 1 = r, for 
positively correlated play the duration has smaller expectation than for the uncorrelated 
case and for negatively correlated play it has larger expectation. In fact as R>-1, D, 3 
and as Ё->— 1, Р, 00. 


4. GENERATING FUNCTIONS FOR THE PROBABILITIES OF THE GAMBLER'S 
RUIN AT THE nth GAME 
Denote by v, „ the conditional probability that X (whose initial capital is 7) is ruined at the 
nth game, given that he wins the first game and by w,, , the corresponding probability when 
he loses the first game. 
Evidently n =0 if r>n—2 orif n—ris ca (41) 
and Wn =0 if ron or if n—ris odd. 


492 The gambler’s ruin problem with correlation 
Define the generating functions (g.f.) of Ones Orn 


Vile) = е Wee) = Уш. (4-2) 


Elementary reasoning shows that the following recurrence relations hold 


SMS m bare (1«r«a—1, nz 1). (4:3) 
Wrin = Par nt daUr n 
Evidently wyı=l 9,4540, nèl. (4:4) 
Define the boundary value 
vy,n =0 (n>1) (45) 


in order that (4:3) may hold for r = 1 also. Now multiply the equations (4-3) by s"** and 
add from n = 1 to co, then since v, , = 0 = w,,, ү, 7 > 1, we have 


V(s) = рүв (8) + pak (4-6) 
and . Wras(5) = pasW.(s) +4878). 
Using the operator E already defined these may be written as 

(1—р,8Е) 78) = qu sEW(s), 

(Е —p,5) (в) = 928¥,(s), 
which show that (з) and W,(s) satisfy the difference equation 

(E — p,5) (1 — p, 5E) x, = q4q55*Ex,, 

i 1+с8,, р 
i.e. HE? E+) у = 0. 

| 2\8 Py * 
Hence Ws) = Ani + By, | (4-7) 
so that by (4-6) (в) = (q,5)7 (Ant + В +1— р„в( Ani + Bui), 


where 7;, 7/2 are the roots of the quadratic 


1+cs* 
py - 009) a p, = 0, 


Since (4-6) are homogeneous in (в), W.(s) and hold for r = 1, 2, ..., 4— 2, they can determine 
V,(8) and w,(s) for = 1,2,...,& — 1 if two of these 2(x — 1) functions are known. These two 
known functions are V, ;(s) and W;(s), since on using (4-2) the relations (4-4) and (4-5) lead 
to the boundary conditions 

108) = E цав" 0, WH) = E wns = e. Le 

ne nel 
For these values (4:7) give 
An + By, = в, Ani- pas) + Bni- раз) = 0, 
whence, writing 
D(s) = Dii – P28) — n7 pa8), 

we have DA = —1(%ь—р,з) and DB --epg-(— pa8). 


27 


ات وو و و و و ي فف Айла Аааа аба.‏ پیت zr‏ ق zz‏ فق ن و Zz Zz‏ و ی ن SZ‏ د 


O. MOHAN 493 
Substituting these values in (4-7) and noting that 


(n — psa) (apy) = 22 — P5 talp p, — 1)} + pat = MBB, 
Ру Ру n 
we have finally 
Vis) = aD Gib? nti), | Ard 
and Ws) = 80-9-1710 — P28) — f ^ 9803 — P28)}- 
It may be noted that the probabilities a,, b, of ultimate ruin calculated in the second section 
may be determined from these functions V;(s) and W,(s). In fact, 


a, = У „= 5 Yn = V(1), 
n-ri2 n=l 

and b, = X 0, a = WA). 
n=1 


Similarly, we have 
A,=V,(1) and В, = W;(1) 


5. OBSERVATIONS ON THE CASE @ = 00 
(i) Probability of ultimate ruin 
Evidently when p; ру, i.e. when the drift д is negative, as о: >00, а, and b, as given in 
(2-2) tend to unity. Even when p; = 7}, i.e. the probabilities of a win or loss being continued 
are equal, the formula (2-2) show that with probability unity, X will be ruined; but when 
р» < ру, then the probability of ultimate ruin, according as the first game is won or lost is 


Аа or xa. 
9» 
(ii) Duration of play 
(а) When p>; i.e. when the drift д is negative, as > o0, the last terms in (3-5) tend 
to ero ааа EES ui р (2340—20. 
x 41 — da " 9—9 
(b) When p< p, i.e. when there is a positive drift д, as a. — со, (3:5) show that A, and 
B,>o. 
(©) When р» = ру, аз 2-00, (3:12) shows that А, and B,—-oo and hence from (3:13) 
D, > оо, unless с; = 0 = q. This exceptional case occurs when with probability unity the 
first game is lost, as also is each succeeding game, во that X is ruined in r games. 
In conclusion, I wish to express my gratitude to Dr F. G. Foster, who suggested this 
problem to me and also supervised the investigation, 


REFERENCE 


FELLER, W. (1950). An Introduction to Probability Theory and its Applications. New York: Wiley 


and Sons, 


[ 494 | 


TABLES FOR SIGNIFICANCE TESTS OF 2x2 CON. 
TINGENCY TABLES 


By P. ARMSEN 
National Institute for Personnel Research, South African Council 
for Scientific and Industrial Research 


1. INTRODUCTION 
In a 2 x 2 tables of the type Y МУ 


Not X 


itis assumed that a, b, c and d are the absolute frequencies resulting from a double dichotomy 
of N individuals according to the properties X and Y. 

The null hypothesis to be tested is: 

There is no association between X and Y. If this is true, one would expect in the long run 
of experiments with the same marginal totals А, B, R, S, to have @:b = A: В, or which is 
the same, N. = А.Р. 

The whole 2 х 2 table сап — ог fixed marginal totals—be determined by опе of the 
entries. It is arbitrary which one would choose. The probability of the set of frequencies 
observed and of possible other sets of frequencies which might have been observed can be 
computed by the exact hypergeometric formula given by Fisher (1941, p. 95).* From this 
it follows that the possible sets of frequencies can be looked at as events with a completely 
known one-variate discrete probability distribution. It is bell-shaped or—in some cases — 
J-shaped, e.g. 


orit B 0 1 B 
Figure 1. 


hunt are detailed discussions of the applicability of the formula in Barnard (1947) and Cochran 
o2). 


Р. ARMSEN 495 


The range of the distribution of b or d is from 0 to B, if B is the smallest marginal total. 

The present tables are based on direct evaluation of the hypergeometric formula, 

It is possible to look at the given null hypothesis in two ways: 

(1) The object is to determine experimentally whether or not there is reasonable evidence 
for a positive association between X and Y (or, which is the same, for a negative association 
between X and Not Y). In this case it is natural and generally agreed to take the observed 
table as significant (for departure from the null hypothesis) at, say, the 1002 % level, if 
the sum of the probabilities of the observed and the possible more extreme tables (extreme 
in the same direction) is smaller than or equal to «, where, for example, x = 0-05 or 0-01. 
This is called ‘the case of one-tailed significance’. 

(2) There is no a priori knowledge about the relationship between X and Y. Here a 
decision must be made before starting the experiment as to which of the possible events 
(sets of frequencies of 2 х 2 tables) are to be taken as evidence for an existing association 
between X and Y (positive or negative whichever the case may be). 

In this case we are looking out for alternatives to the null hypothesis which may throw 
the value of a frequency in the 2 x 2 table, say d, into either of the tails of the probability 
distribution. The usual procedure is to cut from each tail a number of terms for which the 
probability on the null hypothesis sums to < ja. If the observed value falls in either of 
these tails we say that the result is significant at the 1002 % level, using a ‘two-tailed’ test. 

In analogous problems where the probability distribution is continuous it is possible to 
find significance points, whether one-tailed or two-tailed, defining rejection regions associ- 
ated precisely with any desired value of «. For discontinuous variation which arises when 
dealing with the hypergeometric type of distribution of our present problem or with the 
binomial or Poisson distribution, this is not the case. 

The position is illustrated in Example 1 given in the following section. If we take x = 0-05, 
and look for significance in the sense of a positive relation between X and Y, we shall only 
establish this if d = iis 6 or 7. But the probability of this result is only p(6) + р(7) = 0-71 95, 
far below the 5 % aimed at. If we use a two-tailed test based on cutting off regions from the 
two ends of the distribution for each of which the sum of the probabilities is < 2-5 %, then 
we see that this region will again only include i = 6 and 7. 

Unless we adopt a procedure of random sampling for a decision, discussed, for example, 
by Tocher (1950) and Cochran (1952), there is no means of obtaining exactly 5 % for the 
rejection region. It will, however, be noted that while for the single-tailed test the region is 
clearly defined, for the two-tailed test alternative regions are possible if we drop the usual 
convention of making the probability associated with each tail region sum to ta or less. 
Thus coming back to Example 1, it might be asked whether there is serious objection to 
including i = 0, 6 and 7 in the two-tailed 5 % rejection region, giving a total probability 
of 2-58 + 0-71 = 3-29 % (still under 5 %) on the null hypothesis? Ый S 

The object of this paper is to consider alternative definitions of the rejection regions for 
the two-tailed test, and also to provide tables of 5 and 1 0% significance levels for both one- 
and two-tailed tests, and for 2x 2 tables containing up to N = 50 observations. These 
tables, by adopting a new method of entry, make it possible to go beyond the range of 


existing tables using relatively little space. 


496 Tables for significance tests of 2x 2 contingency tables 


2. CHOICE OF THE REJECTION REGION FOR THE TWO-TAILED TEST OF SIGNIFICANCE 


The definition of a two-tailed 1002 % significance region referred to in the introductory 
section, which has been used largely, e.g. in Finney’s (1948) tables, may be expressed as 
follows: 

Definition D,. Cut as much as possible but not more than 30 x 100 % from each tail of 
the distribution under the null hypothesis. " 

While in any particular case, when the hypergeometric probabilities have been calculated, 
it may seem clear on intuitive grounds how to enlarge this rejection region by including one 
or more further values of d and yet keep the total probability below 1002 %, when pre- 
paring tables for general use it is necessary to have some clearly defined, consistent rule 
of procedure. 

A possible procedure which the author prefers is given by the following definition which 
is that used in the mathematical description of the tables (pp. 502-5 below): 

Definition D,. Arrange the possible events (defined say in terms of d) in ascending order 
of the size of their probabilities under the null hypothesis; include in the 100« %, two-tailed 
rejection region those events for which the cumulative sum of these ordered probabilities 
is smaller than or equal to «. (The kind of association—positive or negative—which is 
likely to exist will be indicated by the observed event.) 

An equivalent visual description, which can be illustrated on the diagrams given above, 
is as follows: 

Lift a parallel to the horizontal axis until the sum of the probability ordinates falling 
below this line has reached as closely as possible, but not exceeded «. Each event whose 
associated probability ordinate is now completely below the parallel is two-tailed significant 
at the 100« % level. The others are not. 

The advantages of this definition D, are that: 

(i) It is straightforward to apply. 
(ii) It will often include more points in the rejection region than will D,, without 
increasing the overall probability above o. 

(iii) It is consistent in the sense that if an observed event is significant at the 1000, % 
level, then it must also be significant at апу 100a, % level, where a>}. 

(iv) There is a simple way of generalization to the case of multivariate discontinuous 
distributions, e.g. for the h x k contingency table. м 

It can, however, lead to certain anomalies which, as Prof. E. S. Pearson has pointed out 
to me, may not be regarded as acceptable. Consider the three examples given below. 

The definition D, will be given later. In Example 1, D, includes i = 0 in the rejection region 
which D, doesnot, and this seems very reasonable. For the two-tailed 5 % level in Example 2, 
D, again gives a larger region than D,, although it is open to question whether it would 
not have been better to include i = 0 rather than 4 = 5 in the region. When, however, we 
take а 7-2 % level* in this example, we find a more serious difficulty. Although 


p(0) = 3:58 % <4 x 1-295, 


D, does not make i = 0 significant, but chooses i = 5 instead. Thus in this instance Ds 
contains no term or terms from the lower tail of the distribution. 


А * I$ should be made clear that anomalies of this kind arise only rarely and therefore we have 
illustrated this situation on Example 2, taking an unconventional significance level. 


Example 1 
а с А 33 
b d B Lj 1 
= — —. 
R 8 | N 25 15 40 


If d is replaced by a variate i and the marginal totals are kept constant, the probability 
distribution of i (under the null hypothesis) is as follows: 


Cases of i which are two-tailed 
significant at the 5% level 


Example 2 
ii 33 
i 7 
26 14 40 


Two-tailed 
7۰2 % level 


498 Tables for significance tests of 2 x 2 contingency tables 
Example 3 


31 
i 12 
31 12 43 


AERE 


| | 
| | | 
pli) in % 6-624 0-877 | 0-102 0:006 | 0 | 0 | 0 | 
| 


СЕ Two-tailed significant 
реше at the 1% level 
D, 8 9. 10011 12 
D, FE TIOS TI 12 
D, 8 9 10 11 12 


If we consider what is the objection to the D, region in this last respect, we seem to reach 
the following broad conclusions. The purpose of a two-tailed test is to pick out departures 
from the null hypothesis in either direction, i.e. to establish significance when there is 
either a marked positive or negative association between X and Y. Clearly it can only act 
in this way if the rejection region contains terms from both tails of the distribution. In the 
case of extreme skewness where the first (or last) term has a probability that is greater than 
1000: %, ‘two-tailedness’ is obviously impossible, but in cases like those of Examples 2 
and 3 it is possible to have a two-tailed 5 % (Ex. 2) and 1 % (Ex. 3) rejection region. What 
is less clear is how to define that region in general terms. 

It is possible that a more fundamental attack on the problem could be based on a study 
of the power function of the test.* We shall, however, be content here to put forward а 
definition giving a region which must include that resulting from the customary D; but 
which, as with D,, will often contain more points. If we start from the two-tailed region 
of D, there will often be only one choice of a point to be included in the wider region; this 
is the case for Example 1, where the 5% rejection region under D, (including û = 6,7) 
can only be extended by including i = 0. In the case of Example 2, however, the 5 % region 
of D, (6, 7) can be extended either by including i = 0 or 5 but not, of course, both. 

А number of alternative definitions were considered; some of these were ambiguous or 
could lead to the inconsistency referred to under (iii) on p. 496 above. Finally, we have 
taken a definition D;, which appears generally, though not quite always, to satisfy intuitive 
requirements for a two-tailed test. 

Definition Рз. Define F(E), or the ‘first tail’ probability, as the cumulative sum of the 
probabilities under the null hypothesis of all possible events which are more extreme, in the 

* [In this connexion it is perhaps relevant to note that when considering, on the basis of the power 
function, an ‘unbiased’ two-tailed test in the case of continuous asymmetrical variation, Neyman 


& Pearson (Statist. Res. Mem. 2 (1936), 18-25) found that a larger probability (і.е. > 4) should be cut 
off from the steeper tail of the null hypothesis distribution than from the finer tail.—Ep.] 


Р. ARMSEN 499 


same direction, than a given event Е, including the probability of E itself. Define 5(Ё), 
or the ‘second tail’ probability, as the cumulative sum of probabilities, starting with that 
of the event most extreme in the opposite direction as compared with Ё, and cumulating up 
to but not exceeding the value of F(Z). If, and only if, F(Z) + S(E) «a, include Е in the 
rejection region for the two-tailed 1002 9/ level of significance. 

The rejection regions defined by D, are shown below the table of probabilities for each 
of the Examples 1-3. 

It can be readily seen that every event which is significant under D, will be significant 
under Юу, but the regions under D, and D, will not necessarily correspond. This last point 
is illustrated in Example 2, where D, may be thought to meet the requirements for a two- 
tailed best better than D. On the other hand, in the case of Example 3, where a two-tailed 
1 % rejection region is required, D, gives a region no wider than D,, failing to include the 
point i = 7. It is interesting to note that in this case there is a genuinely two-tailed region 
which could be used, namely, that including i = 0, 9, 10, 11, 12, giving a total probability 
of 0:926 < 1-000 94. From the point of view of all-round power of discrimination it is possible 
that this last region might have some claim for consideration. 

To illustrate anomalies and differences between definitions, we have naturally picked 
out exceptional cases, but it should be emphasized that these cases are very rare. Thus, 
among all the 2 x 2 tables with N < 50 considered in building up the tables of significance 
points given below, there were only thirty-two cases in which the regions defined by D; 
and D, differed (one case of disagreement usually results in two changes in the tables). 

In forming the tables with significance levels of 5 and 1 94, the computation was done 
independently three times, once on the basis of D; and later on the basis of D, and of another 
alternative definition, not given here, which was finally discarded for possible inconsistency. 
The tables contain the results according to D,. The cases where the result of using D, is 
different are indicated by + or — signs above or below the corresponding entry; thus a 
+ indicates that the entry is to be increased by 1, a — that it is to be decreased by 1, to 
obtain the limit under Р. 

In the following section we give directions for use of the tables with an example, while 
an Appendix contains the mathematical description of the method by which the tables 


with this special form of entry have been built up. 


3. DIRECTIONS FOR USE OF THE TABLES 


N <50 is the number of observations. 
The tables are sometimes applicable in cases N > 50. See Note 2. 
No 2 x 2 table where N « 6 is significant one-tailed at the 5 % level. 


Slep 1. Arrange the given 2 x 2 table in the form 


a c A 
b d B 
R S N 


where A is the largest and B is the smallest (or in case of equality one of the smallest) of 


the four marginal totals, A, В, R, S, and aB> Ab. 
If aB — Ab, the table is not significant. 


500 Tables for significance tests of 2 x 2 contingency tables 
Step 2. Note the four figures: 


d for finding the appropriate table, 

b 
c-b-S-B-A-Rz20 
a—d=R-—B=A-S20 for decision. 


Step 3. Look at the table at the end of this paper headed i — d and find the intersection 
of arrays x = b and y = c— b. There are four entries 


| for finding the place in the table, 


гү 2, (exceptions see Notes 1, 2 and 3) 
з 74 


for the one-tailed and two-tailed points of significance according to the scheme: 


One-tailed Two-tailed 


Step 4. Compare the figure a—d with the entry 2„ in which you are interested 
(j = 1,2, 3,4), e.g. the given 2 x 2 table is two-tailed significant at the 5 % level, if and only 
if a—d>z,. Correspondingly for z,, 2g and z4. 


Note 1. If no entry is given, the answer is: 


(a) Not significant for N < 50 if the missing entry should be to the right or below any 
given entry in the table headed i — d, or somewhere to the right of the whole table; e.g. 
i—11,2 = 5, y = 14, a—d = 2. The missing entry should be to the right of the given entry 
13 19 


51 is not significant at the 5% 


гү = 8,% = 11, x = 3, у = 14. Therefore the event 
level. 


(b) Significant two-tailed at the 1% level if the missing entry should be to the left or 
above any given entry in the table headed i = d or somewhere to the left of the whole table; 
e.g. i = 16, x = 2, y = 13, a—d = 1. Because x<3, the missing entry should be to the left 
17. 16 


of the whole table for i = 16. Therefore the event 2 18 


the top level. 


is significant two-tailed at 


(c) The entry '—' instead of a numerical entry stands for ‘not significant for N < 50’. 


Note 2. These tables are sometimes applicable in cases where B+ S < 50 < N, i.e. in cases 
where an entry is given or missing to the left. These tables are not applicable if 50 « B+ 8. 
In most practical cases (assuming N > 50) where these tables fail to give an answer, the 


Table VIII given by Fisher & Yates (1949) gives an answer and vice versa in all cases for 
N « 50. 


Note 3. In cases where only two entries (counting < as an entry) are given instead of 
four, each entry stands for both the one-tailed and two-tailed point of significance. 


Р. ARMSEN 501 


> Example for application of the tables 
Given 
4 18 22 
12 9 21 
| 16 27 43 
Step 1 
" 2 к The only possible other arrangement 
s after having chosen B as the smallest 
marginal total is 
22 21 43 9 18 | 37 
12 4 16 
B216«21«22«271, те?! 
А = 27>22>21>16, 21 22 43 
a=18, b=4, But now a=9, b=12, 
Ва = 16x18 = 288> 108 = 27 x 4 = Ab; Ba = 16x 9 = 144< 324 = 27 x 12 = Ab, 
consequently the arrangement is correct. contrary to the agreed kind of arrangement 
Step 2 
d=12 for finding the appropriate table, 
b=ez=4 
Peres 3 ТО) for finding the place in the table, 
a-d =6 for comparison with z;. 
Step 3. In the table headed i = 12 at the intersection of a = 4 and y = 5, the entries 
21 Za E 1 2 
к, are given aS 6 в 


Step 4. Result: the given 2 x 2 table is 
significant (one-tailed) at the 5 % level (6>1), 
significant (two-tailed) at the 5 % level (6>2), 
significant (one-tailed) at the 1 % level (6= 6), 
not significant (two-tailed) at the 1 % level (6<8). 
The entry 2, = 2 shows that even the less significant table with the same d, B, S, (b and c), 


and with a—d = 2 or 


14 9 23 
4 12 16 
18 21 39 


is significant (two-tailed) at the 5 % level. But 2, = 1 shows that the table with a—d = 0 or 


12 9 21 
4 12 16 
16 21 37 


is not significant (one-tailed) at the 5 о level. 


z, = 8 shows that the first 2 x 2 table with the samed, B and S as the given table, which is 
significant (two-tailed) at the 1 % level, is with a—d = 8 or 


20 9 29 
4 12 16 


24 21 45 


Note 4. A user who wants to apply definition D, must neglect all + or — signs attached 


to any entry in the table. 
A user who wants to apply definition D, must read a + sign attached to an entry as: 


add 1 to the given entry, and a — sign as: subtract 1 from the given entry. For example: 
E 


502 Tables for significance tests of 2 x 2 contingency tables 


In the case i = 5,2 = 1, у = 12, by z, = 21 is meant: 


in case of Dg 2, = 21 
in case of Dg 2, = 21—1 = 20 


Correspondingly, in the case i = 9, = = 0, y = 21, byz, = 9 is meant: 


in case of Dj 2, = 9 
in case of D, 2,— 9+1 = 10 


APPENDIX 
Mathematical description of the tables 
The tables are completely based on the well-known exact formula: 


bd = p(d | BSN), 


лї? _ атт (0) (B-a) 
( ) Nlatbleldi () 


where p(d | BSN) is the (single) probability of obtaining d with given marginal totals B and S and given 
number of observations N, assuming that both classifications are independent of each other. 

It follows directly from the arrangement of the table [B = Min (ABRS) and Ba Ab] that Nd» BS 
and d € B «S € N — B, in particular 28 € У. 

For given d, B, S and N the probability p(i | BSN) (i = 0, 1,2, ..., B) is a step function of i only. This 
function, call it p(?), has the following properties: 

Tt rises monotonically from 4 = 0 to à maximum which is at imax, = [m], where 


(N 2) m = (B+1)(S+1) 

and falls monotonically until i = B. 

There are some special cases: 

(1) imax, may be 0 or B, in which cases p(i) is monotonically decreasing or increasing in its whole 
range. 

Ba Eds are two equal values p(imax,) = p(imax, — 1) in the cases where m is an integer. 

roof. 
p(i):p(i-1)- n:D, where n = (i+1)(N—S—B+i+1) and D = (B— i) (S- i). 


Thus P(t) <pli+ 1) if v(t) <0, 
p(i)>p(i+1) if v(i)>0, where v(i)(N+2)=n—D, v(i) = i--1—m. 
Therefore, if t>imax, v(i)>[m]+1—m>0, 


if i<imax, v(i)<[m]—m<0, 
which proves the above statements. 


Р. ARMSEN 503 


It may be shown that x] 1 
[x << |+. 


Definitions f=f(d| BSN) or ‘frst tail’: 
f(d| BSN) = PLI 
e = s(d| BSN) or ‘second tail’: 
s(d| BSN) = PELIS where k«d, 


p(k| BSN) <p(d| BSN), p(k-1|BSN)»p(d|BSN) (&+1 = d in the case d = imax). 


If no k exists (that is, if p(0| BSN) > р( | BSN)), then s(d | BSN) = 0. 
[k is (given B and 5) a function k(d, N) and Nk < BS.) 
t-t(d|BSN) or ‘two-tailed probability’, 
t=f+s. 
From these definitions follows: 


Statement 1. p(i| BSN)>p(i| BS,N +1) if i>d 

Statement 2. f(d | BSN)» f(d | BS,N +1) exactly. 

Statement 3. p(i| BSN)<p(i|BS,N+1) if i<k(d, N) 

Statement 4. Қа | BSN)2t(d| BS, N +1) not exactly, but with sufficient approximation in practice. 


The construction of the tables is based on Statements 2 and 4. 

For the one-tailed case, it would be enough to compute f(d | BSN) for fixed d, B, S and increasing N 
until the table is found to be significant. Then Statement 2 enables the tables to be condensed to the 
given form. For the two-tailed case, Statement 4 could serve the same purpose if it were exact. The 
inexactness of Statement 4 was the reason for computing all possible cases until N = 50. The com- 
putation was done by direct evaluation of the exact formula for t(d | BSN). 

Thus the entries z, and z, give answers based on mathematically exact general reasons. The entries 
z and 2, are (although of course correct in the tables) based only on empirical computation. An exact 
proof of Statement 4 is impossible, as one can show examples where it is not true. 


Proofs for Statements 1, 2 and 3 
(1) (h+ BS).p(i | BSN) = [h4-4(N + 1)]. pG | BS, N+ 1), where h = (N +1) (N - 1 — B— 5). Statement 
1 follows on account of (N 4-1)» iN > dN > BS. 
(2) Follows directly from 1. 
BS BS BS B(N — B) N 1 

= = <=. 
(3) AX Nti NV+) NNT) 4N4D 4 
Corresponding to 1, one finds: 

*p(i, N) 2 p, N +1) as long as (7+ 1)iz BS, 


and for (N41) «BS, pli, N) «p(i, N +1). 
BS _. BS уа E 
But for the single possible value i with NIIS N follows 4 = imax, because 
BS  B+S+1 BS 


1, 


omer iM‏ سے 
mW—N41 ^ N42 (N+1)(N+2)‏ 
and this, together with i<k(d, N), contradicts the definition of k which proves Statement 3.‏ 


Conclusions from Statement 3 
i i i i ,IN 4-1). 
1) k(d, N) is a monotonically decreasing step function of N, ог k(d, N) > k(d. 

(3) AS c as p[k(d, N) | BS, N + 1) <p(d| BS, N +1), it follows that s(d | BSN) «(d | BS, N41) 
but in the cases where p[k(d, N) | BS, N + 1]» (d | BS, N +1), there is one member less in s(N 4- 1) 
than in s(N), and thus s(d | BSN)»s(d | BS,N+ 1). las 

(It is easy to prove that p(i| BSN)> p(i-1 | BS,N +1) ifi<k(d, N).) 


* By p(i, N) is meant p | BSN). 
33 Biom. 42 


504 Tables for significance tests of 2 x 2 contingency tables 
Thus the second tail as a function of N increases monotonically step by step in intervals of N, where 
k(d, N +1) = k(d, N), 
and decreases in big steps from N to N + 1, where 
k(d, N - 1) « k(d, N) 


(or where the length of the second tail decreases). 
Of course, lim s(d| BSN) = 0, because lim imax, = 0. 
Мо roo 


Question of Statement 4 
lim 44 | BSN) = 0, because t = s--f, lim s = 0, and lim f = 0, the last equation following from: 
N+ No N+ 
6) 
З B 
lim imax, = 0 and lim p(0,N) = lim ———=1. 
No 


N+ N+ 
B 


The question now is: Is this decreasing of t(d | BSN) with increasing N monotonic like that of 
J(d | BSN) or not? Unfortunately it is not. 
Proof. Computation gives, e.g. for 
4= 9, Bz9, 82936 N48, 
(9 | 9, 36, 48) < 0-0883, 
79 | 9, 36, 49) > 0:0892. е 
Therefore, Statement 4 is certainly not exact.* 
The size of ¢(N) and {N + 1) is influenced mainly by the sums 
Ty = p(d|BSN)+p(k| BSN) and ty,, = p(d| BS,N+1)+p(k| BS,N +1), 
where both times k = k(d, №). 
The second term іп туу. is included in {N +1) only under the condition 
pd, N+1)>p(k, N +1). 
Thus only these cases must be considered. 


We have Ty = gu+on 
Dy 


me tee (OD, exe (V8, mee 


k = Ка, N) in туу also. 
It is easy to show by simple computing that 


B B-d B-k 
.D a Be) = em Latt 
"и va( xz) ZU па) + eia) 
and the ratio 7y/ry,, will be smallest (that is, the ‘bad’ case for Statement 4) if 


Oyu = буз OF TN Dy = ON 
and in this worst case one has 
B 2B—(d+k) 
т 1———_} = — ————— 
x xxi nsa [t 2N+1-8S)]* 
Thus, in the worst case Ty « Ту, is equivalent to 


.B с?В-(@+® 
NX1^9N11-5) 
* This difficulty does not arise if the definition D, is replaced by D,, because then Statement 4 is 
e = But in case of D;, this example applies and D, does not improve the situation as com 
wi (5 
T If k does not exist, then s = 0 and Statement 4 is exact. 


or 2BS>(N+1)(d+k). 


Р. ARMSEN 505 


Now both the members in this ‘bad’ inequality are usually of roughly the same size, because: 


if 29€ N-1 then d+k<B, 
И 282N then d+k>B, 


and in the case 2S = N + 1 both members are equal and therefore 
t(d | B, S, 28) = t(d| B, S, 28 — 1), 
and only in the rare cases where the three conditions: 


(1) $óya Ove, 

(2) Oxay 

(3) 2BS»(d-k)(N 4 1), 
are all given is it possible to have TNH > TN 


and therefore it is only in these cases to be expected that {N + 1) > 4N). 
Empirically, it was found that, for N < 50 and the levels of 5 and 1 % significance, no inconsistency 
is introduced to the tables by using Statement 4, in spite of its inexactness. 


The author wishes to thank Prof. E. S. Pearson, to whom he owes a redrafting of $$ 1 and 2, 
Mr A. G. Arbous and Mr J. E. Kerrich, for their critical remarks and helpful suggestions, 
and also the members of the Mathematical Statistics Department of the National In- 
stitute for Personnel research, for their assistance in computational work and editing of 
this paper. 


REFERENCES 


BARNARD, С. A. (1947). Significance tests for 2x 2 tables. Biometrika, 34, 123-38. 

COCHRAN, W. С. (1952). д? test of goodness of fit. Ann. Math. Statist. 23, 315-45. 

FINNEY, D. J. (1948). Tests of significance in a 2x 2 contingency table. Biometrika, 35, 148. 

FISHER, В. A. (1941). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. 

Етзнкв, R. A. & YATES, Е. (1949). Statistical Tables for Biological, Agricultural and Medical Research. 
Edinburgh: Oliver and Boyd. 

LarscHa, R. (1953). Tests of significance in a 2x 2 contingency table: extension of Finney's tables. 
Biometrika, 40, 74. 

MAINLAND, D. (1948). Statistical methods in medical research. Tables V and VI. Canad. J. Res. E, 
26, 1-116. 

Тоснквк, К. D. (1950). Extension of the Neyman-Pearson theory of tests to discontinuous variates. 
Biometrika, 37, 130-44. 


33-2 


ificance tests of 2 x 2 contingency tables 


signi 


Tables for 


506 


"eq эм [mo 
ww тышы 


= оо YL YL ге уе RE ШЕ ЕШ ШШ 
FLO KR Vig AQ SIS Xo Vy و‎ A 


Јо чјо тј заечае So Se efn еә ўа 
OSom sasin nn als sspe oils SERA 


ry н 
SN СЫ elel 


Ж E RÎS N: 
Xo So * ERE m9: 3 


^|. m t р 
ЕНЕН GC EEA STS д iis 
Se NM imo X Uwe. чу CO EOP ERNE SES hia. Wm N NEE. 

A È N 
S SRS 3 
3 CIBUM M 
NN 
| UM 
i EN ^ 
=: П E z 
x y | к 


es Pee pas 
/ 2 

Р 
5 8 |/3 /42/ - 
[> Р 


507 


508 Tables for significance tests of 2 x 2 contingency tables 


ae МЕ 


бу 


7, 


QIN SIS Vig 


Hi ® ل‎ 


® wily «I^ «|^ Ola ©» © ху 


А | NID > |a у] O ju O|W‏ ما 


a“ 


س اا 
Sa‏ 


Suv 
So 


© о |№ 


ВН 
c Ho jl © 
PES 

nN o 


ARRANGEMENT OF THE ENTRIES 


| lowe -TatLeo|TWo-TAILED 


а-а > Zj, significant 
< zj, not significant 


` о 9 
O № NN 


Oo M 
оф N 


ON 
Sw 


D,: neglect all + or — signs 


D,: read * as I M / 


N 
œ 


510 


Tables for significance tests of 2 x 2 contingency tables 


/ 3| 5\9 043 -|- - 
Bale اھ اد‎ | - 
3 |3 46 7|// /3| „s 
EHE. 
/ 3|4 5|8 /0\/3-|- - 
458 


24? °| а-а > 2;, significant 
< Zj, поў significant 
: neglect all + or - signs 


2 : read Ff as [22] o7) í 


511 


Note: If no entry is given, the answer is: 
(a) Not significant for N < 50 if the missing entry should be to the right or below any given entry 
in the table headed i=d, or somewhere to the right of that table. 
(b) Significant two tailed at the 1%, level if the missing entry should be to the left or above any 
given entry in the table headed i=d, or somewhere to the left of that table. For i218 all 2x2 
tables which are possible with N < 50, are two-tailed significant at the 1% level. 


[ 512 ] 


MISCELLANEA 


A note on moving ranges 


By H. A. DAVID 


Commonwealth Scientific and Industrial Research Organization, Sydney, 
and Department of Statistics, University of Melbourne 


1. INTRODUCTION 


Consider a production process in which it takes some time to generate a single item for measurement. 
Under these conditions moving averages and the corresponding moving ranges provide simple current 
measures of location and dispersion. Moreover, with the customary assumption of an underlying normal 
distribution, moving average and range charts may be constructed and used in a very similar fashion 
to ordinary mean and range charts (see, for example, Grant, 1946; Duncan, 1952). One point of difference 
is that for the preliminary estimation of the population standard deviation it is convenient to use the 
set of moving ranges, in samples of size n, that become available as the charts are put in operation. To 
obtain this estimate of с the average of the set, the mean moving range w,(n), need merely be multiplied 
by the well-known scale factor converting a range into an unbiased estimate of с. Clearly w,(2) is the 
mean successive difference. w,(n) will, in general, increase with n in its sensitivity to any trends in the 
data; but in the absence of such trends w,(5), for example, will be shown to be considerably more efficient 
than w,(2). 

Corresponding to moving ranges we may speak of moving maxima and minima which analogously 
to charts of ordinary maxima and minima (Howell, 1949) are capable of application to quality control. 

This note deals with some properties of moving ranges, maxima and minima. In particular, an 
expression is obtained for the correlation between two ranges in overlapping samples in terms of means, 
variances and covariances of order statistics. For a normal parent population this result is applied to the 
determination of the efficiency of w,(n). The distribution of runs of equal ranges, which tend to occur in 
moving range charts, is also briefly discussed. It is realized that these devices have not been used widely 
and this study is offered in the hope that a clearer understanding of the methods and properties may 
facilitate the decision as to whether their wider application is worth while. 


2. MovING MAXIMA 


Suppose that 21,2, ..., 20), --: у) represent observations arranged in order of time. We may call 
(Жи)... Vatn) the ith moving sample of n, and shall denote the corresponding maximum, minimum and 
range by m,(n), m;(n) and w,(n) respectively. 

The order statistics in a sample of n will be written, following Godwin (1949), as 


z(1|n)22(2|n)2 ... 2 z(n |n); 


or more briefly as SUD... Une 
To obtain the joint probability 


Р =Рг[т{(з)< Х, m,,4(n) < Y] (0<d<n) 
it is only necessary to consider the (n + d) values from which т, m,,, are calculated. We find 
PzF'"(X)F4Y) (X<Y), 
zF«X)F"(Y) (Х>Ү), 
where F(X) = Pr(z< X). 


However, we proceed to determine the correlation between m, and m,,4 by a less direct method. The 
z's will henceforth be assumed continuous. 


2 


Miscellanea 513 


Consider first two successive maxima and the set ж), zy, ---+ Zi) +) (88y) on which they are based. 
Then m,(n) = m(n) = 2(1|n+ 1) unless xq) or z;,, is the largest value in the set. In the latter case, 
which has probability 2/(n + 1), m,, m, are z(1 | n +1), 2(2|n+ 1). It follows that 


n=l 2 
Eliminimin) = — 4 (a(1 | n DP + Slat! | m+ 2| n 1). 
Writing x, for z(i | n+d) we find likewise for the case d = 2 


n-2 4 n 4 1 
é[m,(n) m(n)] = 553400 2135414 20) + eb» 
and in general 
n-d 2d! d [(n-d—t—1)! 


Elm mya) = 55,4400 т", ae с=т 4e] . 


The required correlation is now given by 
Pimy Misa) = [£ (m, туа) = &*(m,)]/var (m). 
In the case of a normal parent population the first two moments and product-moments of order statistics 
have been tabulated for n< 10 (Godwin, 1949), so that р can be calculated for n< 5 (see Table 1). 
Explicit expressions for p can also be given for n < 3 from the exact results for order statistics up to n — 6 
(Jones, 1948; Godwin, 1949). 
3. MOVING RANGES 
To obtain the correlation between two ranges in overlapping samples we note that 
E(w, Wira) = &(m, — mi) (Mira — а). 
For a symmetrical population this reduces to 
E (wy wra) = 2[& (тта) -&(m,mia)]- 
(тт) can again be expressed in terms of the expectations of products of order statistics in samples 
of (n +d). It will be convenient to write 2j in place of 2, ,4,,.., to denote the jth smallest value of z. Then 
1+4 А 
Elm misa) = s E Pad eni 


where the coefficients p;; may be obtained as follows. бу 
Split up the (n + d) values, arranged in order of time, into three groups [а], [5 and [с], comprising 
respectively the first d, the central (n —d) and the last d of the ay. Then ру, is the joint probability of 


max [a+b] = z; and min[b+c]= a, 


where, for example, max [a + 5] stands for the largest дү) in the first (a+b). Now the joint probability 
of the (i— 1) largest 2% lying in [c] and the (7—1) smallest in [a] is 
(d) (n--d—i—j +2)! 
(4—4+ 1)! (E —7 1)! (n +d)! 
We next require 2; to lie in [a +b] and 2; in [b + c]. According as æ; falls into [a] or [b] this joint probability 
is respectively 


А siete qe mu ee" 
(n+d—i—j+2)(n+d—i—j+1) (n+ d—i—j+2) (n+d-i-j+1) 


Hence we have 3 A. 
(di? (n+d—t—J d-i Le SA R 
NE Gg EDS FD d) (n—i)]. 
Pu = аата уа? 
The required correlation coefficients are now readily obtained and, for a normal parent, are given in 
Table 1 up to n = 5. è 


to 
= 
e 
ni 
to 
S 
> 


0-2239 


3 1 0-5990 0-5395 
2 0-2743 0-2062 


1 0۰6923 0۰6651 
2 0-4309 0:3942 
3 0۰2028 0-1732 


mothe 
е 
©з 
© 
a 
-- 
Ф 
oo 
ш 
(М 
Ф 


1 EN o ل‎ 


4. THE EFFICIENCY OF MEAN MOVING RANGES 

When moving ranges are employed in quality control it is convenient to use as the preliminary estimate 
of с, calculated from N observations, the statistic 

N-—n+1 

= X wj(N-—nd41). 
i-1 
More generally, a mean moving range w, may be constructed based on every rth sample. 
The efficiency E of о, defined by 
Е = vars'|var w, 


= vr) ie] e] 


is given in Table 2 in a number of cases with n<5. For purposes of comparison we list E also for the 
ordinary mean range estimator W in samples of five and the most efficient (non-overlapping) weighted 
mean range estimator w* (see Grubbs & Weaver, 1947). 


Table 2. Efficiency of various mean range estimators based on samples of n from N observations 


gi 
в 


wi (03 


514 Miscellanea 
Table 1. Correlations between maxima and ranges in overlapping 
samples of n having (n —d) values in common [ 
e ^ | sk d Maxima Ranges 


Miscellanea 515 


It will be noted that E is lowest for « with n= 2. This is, of course, the case of the mean successive 
difference which is, however, less influenced by trends than the statistics based on larger n. 

Compared with @ and w* the w-statistics are seen to gain in efficiency as N increases. Their relatively 
low efficiency for small N is presumably due to the uneven weighting of the observations which is 
inherent in their construction. - 

For fixed n all the statistics о, i? апа w* tend to normality with increasing N. This із obvious for i? 
and w* and has, in fact, been proved explicitly for w, (Hoel, 1946). 


5. RUNS OF EQUAL RANGES 


Two ranges in samples of n, with (n—d) observations in common, will be equal if two of these (n—d) 
are the extremes in the (n +d) values involved. Thus for'any continuous population the probability of 


Gua 
km Pin = [(n—d) (n-d- D)n--d)(n-d— 1)) 
Clearly, P,,, may be interpreted as the probability of a run of (d + 1) or more equal ranges, so that the 
distribution of runs p(r) is determined. It is easily shown that p(r) is J-shaped, flattening out with 
increasing n. 

The expected number of runs is given by 


п 
(ғ) = р P444 = (8n — 2) - 2(2n— 1) [(2n — 1) — v(n)], 
-0 


where y(n) = dlogI'(n)/dn. (ғ) is evaluated in Table 3 for n € 20 together with the expected number 
of runs of equal maxima, viz. 


= (n—d)/(n--d) = n{2p(2n)—2y(n)— 1]. 


Table 3. Expected number of runs of equal maxima and ranges 


ı REFERENCES 


DUNCAN, A. J. (1952). Quality Rompo and а Statistics. Chicago: Richard D. Irwin. 
кы, x ( Mid foci Quality Control. New York and London: McGraw-Hill. 
Gruss, Е. E. & Weaver, L. C. (1947). J. Amer. Statist. Ass. 42, 224. 

Hort, Р. С. (1946). Ann. Math. Statist. 17, 475. 

Howe t, J. М. (1949). Ann. Math. Statist. 20, 305. 

Jones, Н. L. (1948). Ann. Math. Statist. 19, 270. 


516 Miscellanea 


Censored samples from truncated normal distributions* 


By A. CLIFFORD COHEN, Jr. 
The University of Georgia 


1. INTRODUCTION AND SUMMARY 


In life-testing and response-time studies, selection procedures sometimes operate to effect a truncation 
at the lower end of the time scale prior to starting a test which is subsequently terminated before all 
sample specimens exhibit the reaction being studied. Resulting samples may thus be regarded as 
censored from a truncated population. The present paper is limited to censored samples from truncated 
normal distributions. It is related to previous studies of truncated and censored samples by Hald 
(1949), the author (1950, 1954), Gupta (1952), and various other writers. It is also related to a paper on 
life testing by Epstein & Sobel (1953), in which some of the advantages of employing censored samples 
to conserve time and test specimens are discussed with regard to a one-parameter exponential distribu- 
tion. For samples of the types considered here, maximum-likelihood estimators (estimates) of the 
population mean and standard deviation are derived, and their asymptotic variances are obtained. 
An illustrative example is given to demonstrate the practical application of these results. 


2. MAXIMUM-LIKELIHOOD ESTIMATION 


The probability density (frequency) function of a normal distribution with mean, m, and standard 
deviation, c, truncated on the left at a fixed terminus, zy, may be written as 


f(x) = (Io (27)? exp- [(— m)*/(209?)] (<< EH 
f(x) =0 (ж< жу), 


where I, = I(£,), is the proportion of the complete distribution retained after truncation and £, is the 
left terminus (truncation point) in standard units of the complete distribution, i.e. 


(1) 


&-(x-m)e, I) = soas, Ф@) = (2л)-%ехр (— 40). (2) 


Let N sample specimens be selected from a population distributed according to (1) and let their life 
spans or reaction times {x,} (i = 1, 2, 3,..., N) be observed and recorded until a fixed number (n «N) 
have reacted. Let observation be discontinued upon determining z,. Thus x, is the greatest of the 
measured observations, and it is known that each of the N — censored observations exceeds that value. 
When z is distributed according to (1), the likelihood function for a sample of the type described is 


P = (кот, от) чех -Xite mien |} au, (3) 
where К is a constant, £,=(t,—m)/o and I, = 1(£,). (4) 


According to Gupta (1952), a sample in which the number of censored (unmeasured) observations is 
fixed, is designated as Type II censored as distinguished from Type I censored samples in which the 
terminal is fixed. If rather than stopping a test after n (fixed) observations have been made, it is ter- 
minated at the expiration of a fixed time, z,, then n is a random variable and the resulting sample is of 
Type 1. The likelihood function of a Type I censored sample differs from (3) only in that K is replaced 
by a different constant, and in (4), z, is replaced by z,. Consequently, the same estimators are applicable 
in both cases except for the trivial case of a Type I censored sample in which n = 0. In this latter situa- 
tion, the estimators would not be defined. 
Taking logarithms of (3), differentiating and equating to zero, we have 


OL Мф, 12 [n-m , N-nó, 

ôm rites т )+ gih S (5) 
ôL _ NEG: n, 14 ([r-m?*,N-n&$ 

e" eT, +2 Jh 0, 


where L = log P and ¢, = ġ(&;) (i = 1,2). 
* Sponsored by the Office of Ordnance Research, U.S. Army. 


Miscellanea 517 


Let 2, designate the reciprocal of Mill's ratio, i.e. 


2= $/1,= exi [ena (6 = 1,2). (6) 


Substitute (6) into (5) and simultaneously substitute m — 7, — 06, which follows from (2). After simpli- 
fying, we obtain the estimating equations 
vw = {(N/n) Z, — ((N — n)/n] Z, — £Y/(£ — $), ed 
Sw? = (- (Nn) SZ, [UN —)/n]& Za - (UN |n) Z,- (N —n)/n) ZAE- E), (b) 


where w is the restricted sample range, v, is the kth sample moment about the left terminus and 8° is 
the sample variance, i.e. 


(7) 


W=%,-%y У = Ж-а, 8% = n-i. (8) 


The two equations of (7) can be solved simultaneously for & and A as illustrated in §4. With these 

estimates determined, estimates of the mean and standard deviation follow from (2) and (4) as 
^ 
ё=ш/%- and &-2-96. (9) 

The standard maximum -likelihood symbol (^) serves to distinguish estimates from parameters estimated. 

Estimating equations (7) can be expressed in a form that is analogous to corresponding equations 
previously given by the author (1950) for doubly truncated normal samples. However, 2; as used here 
is not defined quite the same as in the earlier paper. Неге Z; = Z(£,) is the reciprocal of Mill’s ratio and 
is a function of £; only, whereas for the doubly truncated cases, Z, and Z, were each defined as functions 
of both ğ, and £,. 

When truncation is on the right and censoring is on the left, it is merely necessary to reverse the signs 
of all observations and proceed as for the case discussed above since with f(x) truncated on the right, 
J( —«) will be truncated on the left. 


3. VARIANCES OF ESTIMATES 
The variance-covariance matrix of (M, @) is derived from the second-order partial derivatives of L. 
А 125 oL o L 
We Tot Pili £), dis (5, 5Š») and d,,(5, £) designate stochastic limits of DRM IG PM and 


— — — 88 n — оо. 'Thus we have 


n до? 
Qul, £2) = 1- (QN /n) Z(Z, — &) + EUN —n)/n] ZZ. — £3), 
Ф(&› ёз) = (N/m) 2111 —-&(Z,—5)]- IU —)/n] Z,[1 — &(Z, — 53)), | (10) 
asl 5») = 2+(N/n) &ZJ[1—£&(Z; - &)]- [(N — n)/n] 5ь2[1—#&(7,—,)]. 
The asymptotic variances and the covariance are then given as 
var (M) ~ [67/2] a/n- f) 
var (#) ~ [д°/п1[$и/(фи фи Pa) | (11) 


cov (in, 8) Б [8*/n] ie $ul (ON -$:)1 


where $, is written for Put, Ёз). Marek 
It subsequently follows that the correlation between m and @ is given by 
^ ^ 
АЮ —óul($n9a)- (12) 
i in which N and n are fixed while 
As given above, (10) (11) and (12) are for the Type II censored case in wh Y 
v, i8 M random variable. These same results also apply in the Type I case in which N and x, are fixed 
while n is a random variable, if we replace n, N/n and (N —n)/n by their expected values, 


Eln) = М(1,—-1,}1„ E(N[n) = (1—1) and E[(N-n)/n] = Һ/(1,—1,). 


4, AN ILLUSTRATIVE EXAMPLE 


One hundred units of a certain type of electronic device are selected for life testing from a group which 


has survived 600 hr. of prior service. The number of units which expired prior to the time of selection is 


unknown. The total life in hours is recorded as each of the first ninety units of the sample expires, and 


518 Miscellanea 


the test is terminated as soon as the ninetieth unit expires. For purposes of this illustration, we consider 
logarithms {x} of the life spans to be normally distributed so that for the sample selected, we have: 


N=100, n=90, a, = log 600 = 2.778151, w= 0-235276, v, = 0۰12560147, 
5? = 0-003486678183, »,/w = 0۰533847354 and §?/w* = 0-062987824. 


To solve estimating equations (7), we employ an iterative procedure with initial approximations read 
from a chart recently given by the writer (1954) for graphically solving estimating equations of doubly 
truncated samples. By this procedure, information provided by knowledge of the number of censored 
observations is neglected until subsequent iterations. Thereby we obtain £/? = — 1-60 and £?? = 1-21. 
To improve these initial approximations, we substitute {9 = — 1-600 into (75) and by inverse inter- 
polation obtain £? = 1-302. This value is then substituted into (7а) and we find £D = — 1-663. Repeating 
the cycle, we find £P = 1-287 and £i? = — 1:626. In the notation employed here Ё? is the jth approxima- 
tion to £,. Closer approximations to the required estimates could be reached by continuing the iterations 
described above through additional cycles. However, greater improvement can be achieved with less 
labour if we take advantage of the fact that the iterants already obtained locate two points, P,( m E) 
and Р,(&0, &), which lie on the curve defined by (75), and two points, P AEP, EY) and PP, EP), 
which lie on the curve defined by (7a). We approximate the two curves in this vicinity by straight lines 
through these two pairs of points, and coordinates of their intersection provide improved approxima- 
tions to the required estimates. 

Since P, and P; each have the same ordinate, and since Ру and P, likewise have the same ordinate, 
the required coordinates are readily determined by interpolation as summarized below. 


& £ by (Ta) & by (75) Diff. | 
e 

1:302 — 1:663 — 1-600 — 0-063 | 
‚1-293 — 1-640 — 1-640 0 | 
1:287 — 1-626 — 1-663 +0-037 | 


‘Thus we have Ё, = — 1-640 and Ê, = 1:293, and from (9) it follows that 
G, = 0080217, Ms = 2-90971. 


Since æ = log T, where Т is the life in hours, we estimate the mean life in hours as й, = 812:3 hr. 

If additional decimal places had been required in the above results, we could have continued the 
iterative process described above through one or more further cycles. 

Variances and covariances of the above estimates may be computed using (10) and (11). Although 
ordinary tables of normal curve areas and ordinates are adequate for this purpose, tables provided by 
Sampford (1952) greatly facilitate the computations. Sampford tabulated 


A=24Z-£) and 6-Z[1—-5(Z-5)], 


with 7 designating the argument rather than £ as here. In using his tables, however, a word of caution 

is necessary since an unfortunate printing error resulted in negative signs before some of the entries of 

€ whereas all of these entries should be positive. 

„ With 5, = — 1:640 and £, = 1-293, we interpolate from Sampford's tables, and using (10) compute 
ıı = 0:87961, $1, = 039418 and Ês = 1-12908. On substituting these values in (11) and (12), we 


compute 
var (Mm) ~ 0۰0000964, var (8) ^0-0000751 and рд — 0:3955. 


REFERENCES 


COHEN, A. C., Jr. (1950). Estimating the mean and variance of normal populations from singly 
truncated and doubly truncated samples. Ann. Math. Statist. 21, 557. 
COHEN, A. C., Jr. (1954). Truncated and censored samples. Tech. Rep. no. 11, Ordnance research 
* contract DA-01-009-ORD-288. Univ. Ga. Dep. Math. 
EPSTEIN, B. & SOBEL, M. (1953). Life testing. J. Amer. Statist. Ass. 48, 486. 


Miscellanea 519 


GUPTA, A. К. (1952). Estimation of the mean and standard deviation of a normal ion from 
a censored sample. Biometrika, 39, 260. че а 

Налі, А. (1949). Maximum likelihood estimation of the parameters of a normal distribution which is 
truncated at a known point. Skand. AktuarTidskr. 32, 119. 

SAMPFORD, M. К. (1952). The estimation of response-time distributions. П. Biometrics, 8, 307. 


The rapid calculation of у? as a test of homogeneity from a 2 x n table 


Bv J. B. 8. HALDANE 
Department of Biometry, University College, London 


In genetical work we very frequently have to test a (2 x n)-fold table for homogeneity. We count, for 
example, non-recombinants and recombinants in an experiment on linkage. The frequency of the latter 
is unknown or only roughly known before the experiment. Only if the values found in several families 
are homogeneous have we the right to assume that the value derived from the total has the precision 
given by its standard sampling error. Suppose our table to be 


That is to say the rth sample of s, members consists of a, members of one class and 6, of the other, while 
La, = A, Eb, = B, A+B = N. The exact contribution of the rth sample to д? as a test of homogeneity 
ig (B7 4) 
ABs, 
Suppose A > B. If kis the nearest integer to AB“, (А — kB)/N may be small, say «0-1. If not we can 
take a convergent k/h to A/B, where h and k are small integers, and (kA — kB)/N will be small. 
Then the following is an exact expression for the ‘homogeneity x?’ with n—1 degrees of freedom, 
where h may be unity: - 


, or one of many equivalent expressions. 


EX N2 n » бу} ae Ж 
№ = FAB д.9 а |, ‘ H 


hA —kB 
whilst the following is in error by a factor of less than Е - аск г 
1 n 
м= [х (a, Bb, 8473 (hd - iB v3 |. (2) 
hk | „1 
If A апа В are of the order of 1000, (Ba, —Ab,)? will be an integer with about five more figures than 


(ha, — kb,)*, and the saving of labour is very considerable, while for smaller values most of the calculations 
can be reduced to mental arithmetic. The proof is as follows. 


Let y= ee, s» = ix. 
Let 2, = a,— As, N- = Bs,N-— b,. 
Then =0 and y= N3A-1B-1225; '. 
Also ha, — kb, = (hA — ЕВ) з, N 3+ (h + k) a. 
So (ha, — kb) sz? = (АА — ЕВ) Ns, + 20-- E) (hA — KB) a, + (h + a 


= (hA — KB Ns, +2(h +k) (hA — kB) x, + (h+ k)? ABN. 
Hence $ Га, — Isz3] = (hA - EB) N= + (h+ k)? ABN, 
бА Biom. 42 


520 Miscellanea 
whence (1) follows. But if hA — kB = D, then 
le ‘ot з = 2 -1 
í EUER, LAN В. N = |+ ae e] 
h+k h+k ' (h+k AB hkN ҺЕМ? 


whence (2) follows. 

As an example I take Table 1, where a, and 6, are the numbers of normal and vestigial Drosophila 
melanogaster counted in eleven bottles in a class experiment where a 3:1 ratio was expected on Men- 
delian theory, but vestigial was known to be somewhat inviable. The question arose whether the mor- 
tality of recessives had been significantly different in different bottles. Clearly h = 1, k = 6, gives a 
good fit. 


Table 1 
| 
b, 8, a, — 6b, (a, — 65,)*/s, 
1 | 26 +19 13-885 
15 | 95 —10 1:053 
12 50 —34 23-120 
8 60 +4 0-267 
0 9 +9 9-000 
7 28 —21 15-750 
6 39 - 3 0:231 
9 26 +12 5-538 
7 37 —12 3-892 
7 58 + 9 1:397 
3 59 +38 24-475 
68 487 +11 98-608 
a 


n 
Неге = 1,k = 6, A = 419, B = 68, N = 487, È (a, — 6b,)? 57 1 — (A — 6B)? N- = 98-360. So formula 


т=1 
(2) gives ү? = 16-393, formula (1) ү? = 16-709. There are 10 degrees of freedom, so 0-09 P > 0-08, and 
the data are not significantly heterogeneous. Had the expression E[(Ba, — Ab,)* s7 *]/(AB) been used, 
the first summand would have been 12812/26. In practice only two decimal places were used in the 
sum, and formula (2) was used, the whole calculation being completed on the blackboard in about 
ten minutes. 

It is possible to justify formula (2) verbally as follows. X[(Aa,— k5,)* nz !] is the value of y* for n 
degrees of freedom, if we assume @(а„) = ke, /(h + k); while (ЛА — kB)? N=! is the value of x° for A and В 
on the same assumption. Subtracting this we obtain y; with n—1 degrees of freedom. In fact, the 
dissection of а x? into components is only valid when all samples are large, and in this case loads to 


anerror of 2%. While this dissection is often justifiable, it is desirable to know the magnitude of its error. 


4 
È 


The ‘Inefficiency’ of the sample median for many familiar symmetric distributions* 
By JOHN T. CHU, University of North Carolina 


1. A LOWER BOUND FOR THE VARIANCE OF THE MEDIAN 


If the reciprocal of the (asymptotic) variance of an estimate is taken as a measure of its (asymptotic) 
efficiency, the sample median @ is often (asymptotically) less efficient than the sample mean 7, for many 
symmetric distributions familiar to statisticians. In fact, for a symmetric distribution having its 
absolute maximum at the point of symmetry, if is asymptotically less efficient than z, then quite often 
% is never so efficient as Z, with the possible exception of very small samples. To show these facts, we first 
derive a very simple, yet sharp, lower bound for the variance of the sample median. 


* Sponsored by the Office of Naval Research. 


Miscellanea 521 


Suppose that F(z) and f(x) are the cumulative distribution function and the probability density 
function of a certain continuous distribution, and f(x) is symmetric with respect to x = £, and f(£) 2 f(x) 
for all z. Let 2 be the median of a sample of size 2n + 1; then, writing C, = (2n + I)t/(n!n!), 


TT =f (z — * C [F] [1— Еа) f(z) de 
1 
= f. (r—-£9* C, F*(1—F)*dF 


1 
AGI (F—490,F*(1— F)^dF 


= {4AE (2n + 37. a) 
The equality holds for a rectangular distribution. 


2. EXAMPLES 
It is well known that for normal and rectangular distributions F is more efficient than ž. We shall show 
that this is true for many other familiar symmetric distributions. 

(2-1) Triangular distribution. f(z) = 1—|z|, |z| <1. The variance of Z is }(2n+1)-! and from (1) 
it follows that Z is less efficient than Z for samples of sizes 2n + 1 provided n> 1. Direct computation 
shows that this is also true ifn = 1. 

(2-2) t-Distribution. f(t) = А(1 + 2/0) 499, where А-1 = I( 4k) (ez)! /T(3(k-- 1). Since o? = k/(k— 2), 
it follows that for a t-distribution with k degrees of freedom, ž is less efficient than z if (but not 
necessarily only if) (й—9)ГЧШ) 2n +3 

АГЕ) ° Int] 
For both k = 2m and k = 2m + 1, the left-hand side of the above inequality is an increasing function of m. 
Computation shows, e.g. that the inequality holds if k>5 and n> 25. 
(2:3) Symmetric -distribution. 
Г(2р) 


= pi) 


For this distribution v7 = }(2p + 1)-!. Hence 2 is less efficient than 2 if 
2n+3 
р <a A 
2-42 + D PTP > 571 
The left-hand side becomes smaller if р is replaced by p + 1, and tends to 37 as p >œ. So it has a lower 
bound ўт. Hence the inequality holds for every p» 0 and n> 2. 

(2-4) Cauchy type distribution. This is defined to be ofthetypef(z) = С„/(1+ |x |*) ( — << o0; t 1). 
If a = 2, we obtain the well-known Cauchy distribution for which Z is infinitely more efficient than z. 
It would be interesting to examine whether or not becomes more efficient as increases. Now 2 has 
a finite variance only if «> 3. C, and varz can be obtained by using contour integration (see Whittaker 
& Watson, 1952, p. 118). It follows that 2 is less efficient than z if 

y*sin Зу. 2n+3 
sin*y ~ 2n41' 
where y = 77/a. The left-hand side is a decreasing function of y, and so an increasing function of a. The 
least values of æ for which the left-hand side is equal to 5/3 and 1 (the maximum and mi nimunt f the 
right-hand side), are found to be 4-65 and 3-75 approximately. 


3. REMARKS 


1) Z is not more efficient than 2 for all symmetric distributions. When the parent population has a 
e distribution, for example, ã is more efficient for all samples of odd sizes (Chu & Hotelling). 

(ii) If f(a) satisfies certain continuity conditions, # has an asymptotically normal distribution and the 
asymptotic variance is (4[ f(£)]* (2n + 1). Tf the sample size, therefore, is not too small, the asymptotic 
variance is for all practical purposes a lower bound for the variance of 3. And if ã is asymptotically less 
efficient than 7, then & is less efficient than z for all samples whose sizes are not too small, 


REFERENCES 


i lished). 

Cnu, J. T. & HOTELLING, Н. The moments of the sample median (to be pub hed). 
Waisri ela, E. T. & Warsox, G. N. (1952). Modern Analysis. Cambridge University Press. 
34-2 


z?-1—2)- (0<=<1; р> 0). 


522 Miscellanea 


A simple method of calculating the exact probability in 2x2 contingency 
tables with small marginal totals 


By P. H. LESLIE 
Bureau of Animal Population, Department of Zoological Field Studies, Oxford 


It frequently happens that in the case of data which may be arranged in the form of a 2 x 2 contingency 
table, we wish to determine the exact probability of obtaining the observed result. In the method which 
appears to be adopted usually, we have to calculate the value of a number of expressions involving 
factorials. Although the actual calculations are not difficult, they become somewhat tedious even with 
the help of a table of log factorials, and a simple method of obtaining the exact solution may therefore 
be of practical use. The following method is convenient when the marginal totals are small (and usually 
it is only in such cases that we need to calculate the exact solution), and only a table of the binomial 
coefficients up to, say, n = 20 is required. Tables of these coefficients seem to be published up ton = 12, 
e.g. Barlow, and Milne-Thomson & Comrie, four-figure tables; but they may be extended once and for 
all to higher values very simply by means of Pascal’s triangle. 
Suppose we write any 2 x 2 table in the standard form: 


© Nna 
N-n, 
f N—ng N 


where N is the total number of observations, and the marginal totals fulfil the following inequalities, 
ng<N-ny, ng&N-—ng and ng<ny; 


so that ng is the smallest marginal total, and x the number of observations in the cell with the smallest 
expectation. Then the set of possible results which is compatible with the given marginal totals is 
obtained by giving æ in turn the successive values 0, 1,2, ...,n5, and the probability of obtaining а 
particular value of z is given by the appropriate term of the hypergeometrie distribution, 


па Lets ч) 
х) \пв-= ng, 
1f, as a first step, we write down opposite the values of z = 0, 1, 2, ...,np, starting at ж = 0, the 
successive binomial coefficients for n = n4, which we will call column a,; and if, secondly, starting at the 
bottom opposite 2 = ng, we write down in reverse order the binomial coefficients for n = N —n4; and 


call this column by; then the product column a,b, gives the successive terms of the distribution we require. 
The total of this last column, 


N! 


Zab. = ng! (N пв)!" 


К n, 
which serves as a check if it is needed. The calculation of the exact probability È Р(х) or Y P(x) (which- 
0 


а а H H . H * " "4 
ever tail of the distribution is appropriate in the particular case), then follows very quickly on an 
ыла SLs ae machine; in fact, the individual product terms, a,6,, need not necessarily be 
written down. 


rr an arithmetical example, suppose we had the following table, where n, = 9,n, = 8, N = 20 
and 2 = 7, 


— 


Miscellanea 523 


Then the two essential columns are the binomial coefficients for n = 9, forming column a,, and those 
for n = 11, forming column 5,. Thus 


| T | a, b, 
| 242 ыа А | 
| | 
0 1 | 165 
1 | 9 | 330 
2 36 462 
3 84 | 462 
4 126 330 
5 126 | 165 
6 84 55 
1 36 11 | 
8 9 | 1 
LESER тары {щы т=з Б еше ee AM | 


Then, the sum of the product column, Xa, b, = 125,970, which can be obtained directly on the machine 
without writing down the individual terms. Since the observed number, z — 7, in the cell with the 
smallest expectation is greater than expectation, we would require in this case the right-hand tail of the 


distribution. Thus 
(36 x 11) - (9 x 1) = 405, 


8 
and X P(x) = 405/125970 = 0-003215. 
1 


A test for a change in a parameter occurring at an unknown point 


By E. 8. PAGE 
Department of Mathematics, University of Durham 


1. INTRODUCTION AND SUMMARY 


Consider a sample of independent observations in the order in which they were obtained, z; ... 2,; it is 
sometimes required to test the null hypothesis that all the observations are drawn from the same 
population with distribution function F(z | 0) against the alternative that z;, ..., z,, come from F(z | 0), 
and 2,44, -.., €n from F(a | 0^) (6’ +0). If m is known this is а straightforward problem of comparing two 
samples. In this paper we suppose that m is unknown; this raises new problems. A test is proposed for 
а case where 6 is known and some comments are made on the problems presented by other cases. 


2. ONE-SIDED CASE: 0 KNOWN 


Suppose that the initial value, 0, is known, One possibility for a test is to regard all the observations as 
a single sample and to use the best test that all the observations are from F(z | 0) against the alternative 
that all are from F(a | 0’) for some 0’ +0. Such a test cannot be expected to be very powerful if the change 
occurs late in the sample; the few observations on the new parameter value would be obscured by the 
many on the old parameter. Since the problem of detecting a change in a parameter is important in 
controlling the quality of the output from a continuous production process, it is reasonable to in- 
vestigate whether the methods of process inspection schemes can provide useful tests for the case in 


which we are interested. 


A process inspection scheme for detecting a change in one direction in the parameter was given by the 


author in an earlier paper (Page, 1954). We suppose throughout that the parameter under consideration 

is the mean of the distribution unless the contrary is stated, and in this section we further assume that 

the value, 0, at the start of the observations is known. The scheme consisted of recording the cumulative 

sums S, — ў (ау 0), Sy = 0, and taking action to rectify a possible change in the parameter when 

S,— min 83 h, ie. when the sample path rises a height h above its previous minimum value; this 
O<i<r 


524 Miscellanea 


procedure can be displayed clearly оп a chart. If there is no change in 0, the mean path of the cumulative 
sum is horizontal, while if an increase in Ó occurs the new mean path has positive slope so that the above 
criterion would be satisfied without too much delay. The significance test suggested by this procedure 
is as follows: 

I. Given the observations жү... ж. It is required to test the hypothesis that the mean is constantly б. 


АР 

Use as the test statistic m = max {S,— min Sj), where S, = X (х,— 0), S, = 0, taking large values as 
Osrsn O<i<r j=1 

significant, i.e. reject the hypothesis if m>h. 

It was shown in the paper cited that the properties of the corresponding process inspection scheme 
depended upon the characteristics of linear sequential tests; as the test I is a truncated form of the 
process inspection scheme it is to be expected that in general the properties of the test will be difficult 
to evaluate. A special case that is tractable is where the observations are nought-or-one binomial 
variables. Accordingly, we consider a test for the general case using binomial variables. 

II. Given the observations жү... ж. It is required to test the hypothesis that the mean is constantly Ө. 
Let y, = a if 2;-020 and y, = —b if xı—0 <0, and choose a,b (>0) so that E(y,|0) = 0 (i = 1,...,). 

r 
Use аз the test statistic m = max (S,— min 8), where S, = X y;, Sq = 0, taking large values as signifi- 
O<r<n 0<1<^ j=1 
cant, i.e. reject the hypothesis if m>h. 

For simplicity we shall consider only the case where the distribution of the x; is symmetrical, so that 
we can take а = b = 1; hence y, = sgn(r,—0). In order to evaluate the properties of the test let 
m= eee S, and let p, , be the probability that m, = û (i = 0,1, ..., h — 1) and that m,<h for alls, 

«icr 


l<s<r. Then hol 
1- ELT (1) 


is the probability that the null hypothesis is rejected. Let prob (у, = 1) be p = 1—9. By considering the 
result of the next observation we have the relations 


Pr+1,0 = 900+, 1) 


Pra = Р.Ры-а+4.Рыза (1<i<h—1), (2) 
Dri, h-1 = P Pr, h-e 
In matrix notation we have 
Dae De (3) 
where P is the square matrix, da d 
QUA OT rage: 
R0 q o 020 
р p qos OU |. (4) 
ооо СОҢ) 


Initially Po,o = 1, Poi = 0 (+0). In this formulation we are implicitly using the fact that the m, are 
variables in a Markov chain with Л live states. Clearly 


P, = P'.p,. (5) 


The expression for p, may alternatively be written in terms of the latent roots and vectors, but the 
simplicity of the matrix P makes it quite convenient to use (4) for calculation. On the null hypothesis 
p = } constantly. Table 1 shows values of h and the sample size n for which the probabilities of errors 
of the first kind are at most æ, where æ = 0:05 and 0-01. In order to ensure that the Type I errors are at 
most a for non-tabular values of n the larger value of h should be taken. For larger values of n rough 
interpolation will provide a sufficiently accurate value of A. 

The power of the test IT depends both on the value of p after the change and the position of the change. 
If prob (у‹ = +1) is constantly equal to р so that the change can be considered as having occurred 
immediately, the probability that the null hypothesis is rejected in a sample of n observations is given 
by equations (1) and (5) with r = n. If the change occurs after the kth observation the value of p, tobe 
used in (1) is given by 

Pn = P-*.Pf.po, 


where Ру, P, are the matrix P with p = }, p = p, respectively. 


а‏ = س 


Miscellanea 525 


Table 1. Values of nandh 


In Table 2 the power of the test IT is compared with the power of the simple binomial test with approxi- 
mately the same probability of Type I errors for a sample of 50 observations. For test II to have prob- 
ability of Type I errors just less than 0-05 we need h = 16. The corresponding single sample test is 

0 


‘Reject the null hypothesis if more than 31 of the y, are positive’, i.e. if X у,> 12. The loss of power 
i= 
from using Test II instead of the single sample test is remarkably small. 


Table 2. Powers of the tests for different р 


Single sample test 


0-50 0-039 0-032 
0-55 0-136 0-127 
0-60 0-336 0-336 
0:65 0-609 0-622 
0-70 0-844 0-859 
0-75 0-971 


0-997 


The same two tests are contrasted in Table 3, where their powers are shown for different positions of 
the change from р = 0:5 to p = 0-75. Here, however, the test IT has an appreciably greater power than 
the single-sample test when the change occurs near the middle of the set of observations. Also shown in 


Table 3 are the powers of the single-sample test on the last 50 —m observations, when it is known that 
the change has occurred immediately before the mth observation. The differences between the power of 


this test and that of test П gives an indication of what is lost from the ignorance of the position of change. 

In order to illustrate the test we give an example constructed from tables of random normal deviates. 
A sample of forty observations was constructed, the first twenty having mean 5 and unit variance, and 
the last twenty having mean 6 and unit variance; these are shown in Table 4, Suppose that it is required 
to test the hypothesis that the mean is constantly 5 against the alternatives that an increase in the mean 
has occurred within the sample. The observations, z;, are shown in Table 4 together with y, = sgn (x; — б), 


and the value taken by S, — min S; Я Г ЗН 

The greatest value, }, of S,—min 5; in the sample of 40 is 17, which approaches the 196 significance 
point given in Table 1 (for n = 40, the approximate 5% point is h = 14, the approximate 196 point is 
h = 18). This significance level can be compared with that obtained from other tests applied to the 


526 Miscellanea 


Table 3. Powers of the tests for different positions of the change 


| 
| Single-sample 


m Test IT test on 
whole sample 


Single-sample 
test, m known 


m 
0-971 0-971 
0-864 0-946 
0-625 0-894 
0:330 0-618 
0-122 0-244 
0-032 0-032 


Table 4. Artificial sampling experiment 


Observation no. | 1 2 3 4 5 6 7 8 9 10 

Value of x; 3-95 5-96 6-22 5-58 402 4-97 3-46 4-29 4-65 5-66 
yc = sgn (2;— 5) -1 +1 +1 +1 -1 -1 =l —1 -1 +1 
S,—min S; 0 1 2 3 2 1 0 0 0 H 


Observation no. 11 12 13 14 15 16 17 18 19 20 


Value of 2; 544 591 498 358 526 398 419 666 605 5:97 
y; = sgn (x;—5) +1 +1 =1 -1 +1 -1 =l +1 +1 +1 
S, —min S; 2 3 2 1 2 1 0 1 2 3 


Observation no, 21 22 23 24 25 26 27 28 29 30 


Value of x; 7:14 6-22 4-76 6-60 5:72 4:88 5:44 5-03 5-66 5:56 
yi = sgn (2;— 5) +1 +1 -1 +1 +1 -1 +1 +1 +1 +1 
S,—min S; 4 5 4 5 6 5 6 7 8 9 


Observation no. 31 32 33 34 35 36 37 38 39 40 


Value of z; 6.37 666 510 580 6:29 549 493 618 829 6:84 
y; = sgn (x;— 5) +1 +1 +1 +1 O EST +1 CE E:S, 


S,— min S; 10 TE 12 13 14 15 14 15 16 17 


sample. The single-sample binomial test on the y’s has 26 positives, 14 negatives; on the null hypothesis. 
the probability of this or a larger number of positives is 0-04. The change in the mean causes the estimate 
of Variance of the z's to be inflated, and a t-test fails to give significance. The computation required by 
test II is so simple that it is unnecessary to record the y’s, or even the S, — min S;. An additional advan- 
tage of the test is that it gives an indication where the change took place; the position of the last zero 
of S, — min S, is an estimate (of course, biased) of the position of change. Thus in the example we would 
suspect that the change had occurred near observation 17. 


3. GENERAL REMARKS 


In this section we comment briefly on some other possible methods for the problem of $ 2 and related 
problems without investigating their properties. 

Another test for a change in one direction of a parameter from a known specified value may be obtained 
by analogy with the standard control chart process inspection scheme. The sample is divided into в 


Miscellanea 527 


number of subsamples of equal size and a statistic calculated from each subsample; the hypothesis of 
no change is rejected unless all the statistics fall within а certain range. The properties of this test are 
easy to evaluate and the number of subsamples and the permissible interval for the statisties can be 
chosen to control the errors. The test is also easy to apply and it is frequently useful in rough work. 
However the temporal ordering of the observations enters only into the division into subsamples, 
and it is of interest to examine whether it is advantageous to employ a slightly more complicated test of 
the form ‘Reject the null hypothesis if any k of the statistics calculated from m consecutive subsamples 
fall outside an interval J, or if any one falls outside a wider interval J’* (cf. Wilkinson, 1951; Tippett, 
1931). 

The control chart procedure can also provide a test for the two-sided case where the change from the 
known value can be in either direction. A test based on the mean path of a cumulative sum similar to 
test Lis a truncated sequential test (Rao, 1950). Another case that needs to be considered is where the 
initial value of the parameter is unknown. 


Iwishto thank Dr D. R. Cox for a number of discussions on the subject of this paper, and the Director, 
Mathematical Laboratory, Cambridge, for permission to use the EDSAC. 


REFERENCES 


Pace, E. S. (1954). Biometrika, 41, 100. 

Rao, С. В. (1950). Sankyha, 10, 361. 

ТіррЕтт, L. Н. C. (1931). The Methods of Statistics. London: Williams and Norgate. 
Wirxiwsox, В. (1951). Psychol. Bull. 48, 156. 


А paradox in statistical estimation 


Bv ALAN STUART 
Division of Research Techniques, London School of Economics 


1. Sundrum (1954) has recently shown to be incorrect the intuitive idea that the more efficient of 
two estimators of a parameter necessarily provides the more powerful test of a hypothesis concerning 
that parameter. This note discusses a similar paradox which arises in a problem concerned purely with 
estimation: given an estimator и of a parameter 0 in a multiparameter distribution, one does not 
necessarily improve its efficiency by substituting true parameter values into u to replace estimators of 
ec shall consider the case where there is only one other parameter, say £, and where we have 
а consistent estimator of 0 к=. a) 


where s is a function of the n observations only. If is unknown, we reduce ttoa function of the observa- 
tions only by substituting for £ a consistent estimator of it, say m, giving 


м = f(s,m). (2) 
From (1) we have, to order n}, 
- (2), (3) 
V(t) = (© Ё(в). 
From (2) we have the corresponding result for a function of two random variables 
ди\? ги\° ди ди 
V(u) = 6) V(s) + (=) Vim) +22222 0 т). (4) 


Since all the derivatives in (3) and (4) are to be taken at the true parameter point (0, x), the first term 
1 i 1 to (3). Thus 
on the right of (4) is equal to (3) 2f гуа s i 
V(u)— V(t) = (б) Vim) +222222 C(8, m). 
(5) is not generally positive, although it must be so if в and m are uncorrelated. In general, their 


correlation must be taken into account before the effect of substituting parameters into u can be 
assessed. It is the correlation term in (5) which resolves the paradox. 


“ 


528 Miscellanea 


3. An example in which (5) is negative is provided by the estimation of the correlation p 
edere a ас A consistent estimator. is provided by the sample correla! 
efficient r, which has a large-sample variance, 


1 
Vir) = „(1—р*®)*. 
Although r is the maximum-likelihood estimator of p when all five population parameters are 


simultaneously estimated, it is not an efficient estimator when the population means and variar 
are known, which is the case we shall consider. One might therefore expect to improve its efficien 


ly (s, miy) 
ni 
لم‎ a 
9,73 
We find Er’) = 


noci E((r'!) = В-ка > (2i fy) (yi — 13) (25 — 11) (y; — 13) 
= пиз  n(n— 1) p*ojo}. 
Since д, = (14-202) т} тї, we have from (8) and (9) 
1 
V(r’) 24085 


and comparing (10) with (6) we see that, far from improving the accuracy of estimation, the substituti 
— 0%\% 

of true parameter values for random variables in ғ has multiplied efficiency by a factor (1 р? , whi 
tends to zero for large р?, and is 1 only when р? = 0. +P k 
4. This paradoxical effect can be examined through (5) by restricting our attention to the case whe 


both population means are zero and both variances equal to ¢?, which remains unknown. Our estimate 


(7) then becomes 


1 
"LA 
i= i 
gi" 3 
and with a common population variance, we should in our correlation coefficient use a pooled estima 
of o, giving 
1 
Жаш 
Abu emen o 
1 
g, Df +) 


We shall see below that u has the same efficiency as r. 
In the notation of (1) and (2) we have 


ERA and w= >, 
1 ў M 
where а Жа» й=о% and mm a s toan. 
w : 
е now require Vim) = Za +p), 
GOS 2: 
n 
h 8-3, 
em] кв m, с? 


Miscellanea 529 
Substituting (11) and (12) into (5), же find 


nora нен) (=2) 


p 
"neq (13) 


ur. MEN whenever p! ф 0, confirming the result on efficiency given above. 
. Finally, we may confirm that u is as efficient ая r. Using the general formula large sample 
variance of a ratio estimator, = е5 
А Vie) | Fim) Оа, m) 
V = (Ei S [— + = 8 —— 
1) Ut ер * p ^ о Ei] ° шы 
Using (8), (10) апа (11) in (14), we obtain 


V(u) a Har maA a 


1 
= a-e}, (15) 
agreeing with (6). 
REFERENCE 


SUNDRUM, R. M. (1954). On the relation between estimating efficiency and the power of testa. Bio- 
metrika, 41, 542—4. 


Cumulants of a transformed variate 


By С. 8. JAMES 
University of Leeds 


1. Suppose that z is a variate whose pth cumulant, denoted by Kg, is of order y-?*!, where » is some 
‘large’ number. One consequence of this is that x is approximately normally distributed with а variance 
of order »-!. Thus if y is some well-behaved function of z, say у = f(x), then y will also be approximately 
normally distributed with mean and variance given by 


Ky, = бу = f(2) (1+ 0(71)]). a) 
Kay = vary = [f'(u))* varz(1 + O(v7)], (2) 


where д is the mean value of x. These are of orders and у”! respectively. f 

It is by no means obvious, however, that the pth cumulant of y, denoted by Kpy is of order v-?*! for 
general values of p, and not merely for p = 1 апа 2. The object of this note is to show that this is true, 
at any rate formally. (The result will be true in reality, as well as formally, under the same sort of 
conditions for which (1) and (2) are really true.) } А Р 

Suppose that f(x) сап be expanded formally in a Taylor series round и; there is no loss of generality 
in taking д = 0, so that this series is 
у = Cop T+C, z+... (3) 


where we suppose that су, с, су, ... do not depend upon р. It is quite easy to show, by formal expansion 
starting son (3), that the oth moment of y about its mean is of order v-K?*?l, where [k] denotes the 
integral part of k. But on conversion to cumulants the higher order terms always seem to cancel, to 
such an extent that «py is of order v-P+1, We now give a proof of this result. 

THEOREM 1. If а variate 2 cumulants «,, of all orders, and if xp, = O(v-P*) (р = 1,2,...), 
and the cumulants of y = f(x) are calculated on the basis of a (possibly formal) Taylor expansion (3), 


where the c, do not depend upon >, then «Kpy = O(v-?*!) (р = 1,2, ...). А 
Without loes of generality we тау assume that сз = 0. We may then write y = Lz,, where 2, = ora" 
FETS, Writing (7, ... ту) for the moment £z, ...z,, (some of the гу may be equal) and Kar ту) 


«ce 


530 Miscellanea 


for the corresponding cumulant, we have, by the properties of moment-generating and cumulant- 
generating functions, 


1! 2! 


Riot E = tc [i my, 


1 1 Р 
= log [i + " Et, alr) + 2! Et,t, Щ„{тв)+.. .] 


1 1 
тра) s, Xt, tKa lrs) +..., (4) 
where t, = t, =... = t and the summations are over r, 8, ... = 1,2,.... Hence 
Kpy = УК}... тр), (5) 


summed for r,, ...,7 = 1,2,.... 

Thus to prove the theorem it is sufficient to show that kr, ... 7) is of order y-P** or lower. To demon- 
strate this, suppose rather more generally that £, ...,z, is a sample of n independent values from the 
æ distribution, and let 2, = c,(Ez;)' (r = 1,2,...). This agrees with the earlier definition of z, when n = 1 
(and x, = x). The z, can be regarded as statistics of the sample 2, ...,2,, although of course they are 
functionally related. Each z, is a homogeneous polynomial symmetric function of the sample values, 
of degree r, and the coefficient of zf... 22 in z, (with Ea, = r) is 


r! 


a! а 896% m) (e 
where Baja, ... aa) = €. (7) 
A corresponding result holds for Fisher's statistics k,, but with 
Cx Gal 


Boy(a ...a,) = (8) 


n(n—1)...(n—a--1) 
The factors Bq; and B, happen not to depend on the complete detail of thé partition (a; ... «,) of the 
number r; but I have shown (James, 1955) that Fisher's rules for obtaining the sampling cumulants of 
k-statisties, described in Kendall's book (Kendall, 1947), may be adapted to any system of statistics 
215 Za, ... (2, being a homogeneous polynomial symmetric function of degree r in the observations) by 
merely replacing the factors Ву occurring in the evaluation of the ‘pattern functions’ by the factors 
Bw, where B, is the coefficient of ат... 24° in z, divided by the appropriate multinomial coefficient. 

The rule of Fisher which is of paramount importance for our proof is the one which states that, in 
finding cumulants of k- (or z-) statistics in terms of population cumulants, we are to neglect any array 
which splits up into two or more disjoint blocks. Now any array which does not fall into disjoint blocks 
may be built up column by column in such a way that at each stage the new column does not form an 
isolated block by itself. (For example, the columns of * „ * X may be taken in the order 1324, but not 
in the order 1234). Thus, if o, +++,@, are the numbers of non-zero elements in the Ist, ..., pth columns 
and т is the total number of rows in the array, we have 


т< (45 —1)-- ...-- (x, —1) = Ea, —p4-1. (9) 


Now if this array is one of those contributing to the coefficient of Ke, +e Kir ÎN ктү... rp) (во that Xt, = Era) 
then the corresponding term is of order yE-5*9 = y-Ery*r; for the numerical factors and pattern func- 
tions for each separation of the array do not depend upon v, while each Ky is of order v~ +1, But (9) 


shows that 
— Er +T – Ў(т ад) -р+1< -р+1; (10) 


for a, is the number of parts in a partition of у, and so does not exceed гу. Thus this particular term of 
кшз +++ 7р) iS of order v-?*! or lower. Hence the same is true of Kaj(ı ...r,) itself, and finally, by (5), 
of K 

pv: 


2. By the use of multivariate sampling rules more general results than Theorem 1 can be proved. 
Perhaps the most general is Theorem 2. 


THEOREM 2. If y! = f(z*,... x»), y! = f'(x,... 27),... are functions of the variates* z!,... 2? 
formally expansible in the forms 


y = A+ date Усма? +..., (11) 
* The superfixes are indices, not exponents. 


Miscellanea 531 


and if the pth-order cumulant, xi“, of x*, ..., x is of order у-Р+! for 4, ...,$, = 1,..., pand p = 1,2, ... 
then the same holds for the cumulants, x2*^», of the y*. (Here we use a slightly modified form of the 
‘tensor’ notation suggested by Kaplan (1952).) 

An outline of the proof is as follows. If z,, z,, ... denote the quantities z!, ...,z*, al, 212%, ...,2%x?, 
212121, ... written in some convenient order, then (11) may be rewritten in the form 


y = Хоч, (12) 


whence we easily derive 


Kee = X An... oor в, вр), (13) 
Sis ...,8р 

where коа, +++8р) із the mixed cumulant of Zas ++: Za Thus it suffices to show that all the Kus, eg) 
are of order v~?* or lower, for p = 1, 2, .... The proof is completed in much the same way as before, but 
in order to see clearly the application of the results of my other paper the z's should be relabelled as 
follows: z! = a, 2 = xiz, .... 

3. An example. Johnson & Welch (1939) have found by direct calculation, using Stirling’s asymptotic 
expansion of the gamma function, that if ү? is the sum of squares of v independent standard normal 
deviates, then the cumulants of y, as far as the sixth, are given asymptotically as follows: 


күй, Ka d, 
Kay 7, Kay ^ d | (14) 
Ky~ иб, Ku — ҢУЗ. 


Now if we write x = y*/|v— 1, y = J(1--2) = x/4v, then Kp, is of order »-*!, so that Theorem 1 shows 
that Кру is of order v-?*! ог lower and Kox is of order v-1?*! or lower. Thus the odd cumulants in (14) 
are of the order we should expect, but the even ones (apart from the second) seem to be a whole order 
lower than demanded by Theorem 1. І have been unable to prove that Kap, = O(v-?) for general values 


of p (22). 


REFERENCES 


JAMES, G. S. (1955). On moments and cumulants of systems of statistics. (Not yet published.) 

JOHNSON, N. L. & WELCH, B. L. (1939). On the calculation of the cumulants of the y-distribution. 
Biometrika, 31, 216-18. 

KAPLAN, E. L. (1952). Tensor notation and the sampling cumulants of k-statisties. Biometrika, 39, 
319-23. : \ 

KENDALL, М. С. (1947). The Advanced Theory of Statistics, vol. 1, 3rd ed. London: Charles Griffin 


and Co. 


The likelihood ratio test for Markoff chains 
Bv I. J. GOOD 


1, The present paper is virtually a footnote to that of Hoel (1954), who, using and acknowledging 
the methods of Bartlett (1951), constructed a likelihood ratio test for the order of a Markoff chain. If 
H, is the hypothesis that the chain is of order », then Hoel tests H,_, against (or rather within) H,. 
Here we make a simple generalization во as to test H , within H,, and we relate the work to some previous 
work. 

2. Let £, Ta, ..., y be a sequence, ©, of observations, each observation being capable of taking one 
of t values denoted conventionally by 1, 2,...,t. Let Н, be the hypothesis that © is a Markoff chain. of 
order v (v = 0,1,2,...). Note that H, means that © is а random sequence. We may obtain uniformity 
in expression by defining H, to mean that © is a ‘perfectly ‘random or ‘equiprobably random sequence. 
Н _, is the опу H, that is a simple statistical hypothesis. Clearly H, implies Н, whenever д « v. 

The likelihood ratio test for composite hypotheses may be expressed in the following form. Let H 
and Н’ be hypotheses such that Н implies Н”, so that H' at least is not a simple statistical hypothesis. 


Let E be an experimental result and let 
А = max P(E | H*)/max P(E |Н”), 


532 Miscellanea 


themaxima being taken over all simple statistical hypotheses H*, H’* belonging to Н and H' respectively. 
А is the likelihood ratio statistic for composite hypotheses and we may say that it tests H within Н”, 
We may call Н the null hypothesis whether or not it is simple and (within orthodox statistics) A can be 
used for the rejection of Н. 

Hoel (1954) used A for testing H,_, within H, (v> 1), and, with suitable conventions, his result applies 
also for > = 0. Bartlett (1951) used the likelihood ratio statistic for testing completely specified chains 
of order v — 1 within H,. Since H_, can be regarded as a completely specified chain of any order, Bartlett's 
work included tests for H_, within Н, (v = 0,1, 2, ...). Similarly, a completely specified chain of order £ 
is also one of order v—1 (if £<»), so that Bartlett’s work applies for testing any completely specified 
chain of order и within H, where # <v. 

We shall here generalize Hoel's work in order to test H, within Н, where 4 « v. Only when д = —1 
does the generalization overlap with Bartlett's results, since the only H, that is completely specified 
is H_,. (A completely specified statistical hypothesis is the same thing as a simple statistical hypothesis.) 

3. Associated with G is the corresponding cyclic sequence & defined by regarding the first element 
of © as immediately following its last one. We shall always denote properties of © by placing а bar over 
the corresponding algebraie symbol relating to ©. 

A sequence of v consecutive observations is called a v-sequence. Let m,,...,,, or n, for short, be the 
number of v-sequences in © which are (r4, fa ...,7,) = г. Let 7i, be defined similarly. Let 


у= У (к) уу" (>1), (1) 
r 
у = Ў (п, №) (vz1) (2) 
т 

K,-2Xnlogn (>1), (4) 

т 
Ko = 2(N—v+l)log(N—v+1), К, = 0, (5) 
K,-2Xmlogm (v>1), (6) 

T 
K,-2NlgN, K_,=0, (7) 
Vy? = yè- y? ete., (8) 
Vap? = 2-27 yia. ete. (9) 


(The logarithms are to base e and 0 log 0 means 0.) 
Tt is known that if H_, is true, then y, and уу do not have asymptotically gamma variate (chi-squared) 
distributions, but VY, VY, VY, and УЗ do (see Bartlett, 1951; Good, 1953). Moreover (precisely), 
Фрі = Фу? = Р -1, (10) 
ed and Vy? have V?" degrees of freedom (> > 1), and УЗ and УЗ? have Vt" degrees of freedom 
v>2). 
Hoel defined A as the ratio of the maximum likelihood given H,_, to that given H, and obtained the 


asymptotic distribution of A, given H,_,. Let us denote by A,,, the ratio of the maximum likelihood 


Ps a Й a, H, (и <»). Then Hoel’s А is our A,_,,, and obviously À, is the product of v — 4 


Ayr = А ioa pois aia s Ayo au 
The expression given by Hoel for — 21og À,..,, may be written in the form 
—2logA,_,, = VKy (>= 1,2,3,...). (12) 


Me ee и this equation is also valid for у = 0. 
oel’s result can be stated in the form: When H,_, is given, V2K,,,, has ically variate 
, Hoel's1 t к " asymptotically a gamma vari 
distribution with УЙ degrees of freedom if J> 1 and we may add that for v = 0 it has asymptotically 
в gamma vermis distribution with #— 1 degrees of freedom, since it then reduces to the likelihood ratio 
test for ‘perfect’ (equiprobable) randomness against randomness in general. Clearly 
- 2log Ày = — 2log Ay, „+у—...— 21о@А,-,, 
= VK, a + VK, + + VK A. 
= VK VK (—1<p<y). (13) 


Miscellanea 533 


If we could assume that the variables —2logA,_, ‚(> = 0,1, 2, ...) were asymptotically independent, 
given Hy it would follow that — 2logA,., has a gamma variate distribution with degrees of freedom 


Vr Vb (дэд), | 


ven è  (p=-1)} (14) 


and we could use these results for testing H, within H,. Unfortunately, the independence does not seem 
to be easy to prove. Nevertheless, the result just stated is correct and can be proved by precisely the 
same method as Hoel used, except that his suffix i is to be replaced throughout his proof by a sequence 
of v—4 suffixes. It is unnecessary to repeat the argument since the modifications are entirely trivial. 
The conjecture that the variables — 2 log А,_, „ are asymptotically independent is strengthened by the 
knowledge that the above deduction from it is correct. 

4. The value of the generalization of Hoel’s results is that there may not be a significant distinction 
between ‘adjacent’ hypotheses in the sequence H, Н, ,,, ..., H,, yet H, may be clearly rejectable by 
the statistic VK,,, — УК. If we put д = 0 [д = —1] we have a test for randomness (‘perfect’ ran- 
domness) within Markovity of order v. If д = 0 and v = 1 we obtain a test that is a special case of the 
likelihood ratio test for contingency tables. (See, for example, Wilks, 1946, р. 220, where, however, there 
is a minor slip in that the expression given as A is really 1/A.) 

5. There is clearly a strong analogy between the expressions VK,,, — VK „+ and Үр, — Vai 
When H, is true the latter expression also has asymptotically а gamma variate distribution with the 
same number of degrees of freedom as the former expression, since (as shown by Good (1953)) the 
variables УЗ? (v = 1,2,...) are asymptotically independent, at any rate when t is à prime number. 
This analogy is no coincidence since VY} is the asymptotic form of VK, ,,— VK, when H., is true. 

6. We may also use the cyclic definitions. When Н, is true, VKE,,,— VK „+ has asymptotically a 
gamma variate distribution with a number of degrees of freedom given by a4. The cyclic definition is 
mathematically simpler than the non-eyclic one and makes the checks 

En RUE S En, = му 
- Ty т 
precise. à 

7. Not much is known concerning the accuracy of the asymptotic formulae when N is specified. 
When testing H. , the psi-squared statistic has the advantage that its expected value is precisely known, 
but in principle this statistic may well be less powerful than the likelihood ratio. Unless N is very small 
there is probably little to choose between the cyclic and non-cyclie forms of the latter. 

8. Another statistic that would be worth consideration would be VL,.4 — VL, (and its cyclic 


form), where L,-2Xlogn! (v21, ъ= 21080,  L4-0 (15) 
т 


When > = 1 and д = 0 this statistic reduces to V2L, which is minus twice the log likelihood of the 
fs, r, в, regarded as forming the interior of a contingency table for which the marginal totals are assigned 
and independence is assumed. 

REFERENCES 


BARTLETT, M. S. (1951). The frequency goodness of fit for probability chains. Proc. Camb. Phil. Soc. 
47, 86-95. 

Соор, I. J. (1953). The serial test for 
Phil. Soc. 49, 276-84. 

Hort, Р. б. (1954). A test for Markoff chains. Biometrika, 41, 430-3. 

Wires, S. S. (1946). Mathematical Statistics. Princeton. 


sampling numbers and other tests for randomness. Proc. Camb. 


Exact forms of some invariants for distributions admitting sufficient statistics 


By V. S. HUZURBAZAR 
University of Poona, India 


1. INTRODUCTION 
i i istics is due to Mahalanobis (1936) who defined the distance between two 
Ты con LN 46, 1948) defined a wider class of invariants of 


multivariate normal populations. Later Jeffreys (19 48) o 
probability distributions which may be regarded as providing sor of distance between two 


584 Miscellanea 


probability distributions, in general. If p and p’ are the density functions of two probability distributions 
of a variate x, then the expressions defined by Jeffreys 


1, = [| ppm de, (a) 


and J = [о-вов ae, (2) 


are all positive definite and invariant for all non-singular transformations of the variate and the para- 
meters. These expressions, therefore, provide measures of distance between two distributions. For 
discrete distributions p and р’ are simply the probabilities that x takes a particular value x. Extension 
to multivariate distributions follows immediately. 

An important case is the distance between two distributions having the same mathematical form but 
with different sets of values of the parameters. If the corresponding parameters in the two sets differ by 
infinitesimals, we get the differential forms of the invariants Z» and J. The invariants J, and J have 
interesting mathematical properties and have been used by Jeffreys in stating the prior probabilities of 
parameters in his theory of estimation and tests of significance. Though these invariants have not yet 
attracted the attention of the frequency theorists of probability, it is likely that they may find applica- 
tions in their work also. 

Jeffreys has obtained the exact forms of the invariants J, and J for the univariate and bivariate 
normal distributions, the Poisson and the binomial distributions; and for these distributions the exact 
forms come out as explicit functions of the parameters of distributions. 

The object of this paper is to prove, in general terms, a remarkable property that for all distributions 
admitting sufficient statistics the exact forms of I,, (m even) and J come out as explicit functions of the 
parameters of distributions. It may be mentioned here that the properties of Z„ (m=: 2) have so far 
remained uninvestigated, the form of I,, being then very complicated. 


2. DISTRIBUTIONS ADMITTING SUFFICIENT STATISTICS 


We shall take the most general form of distributions admitting sufficient statistics as given by Koopman 
(1936), 


F(x, a) = exp |$ urla) vy(x) + A (2) + ва) y (3) 


where, for brevity, (&%;) denotes the set of p parameters (2, 03, ..., 21); u, and В are functions of the c, 
and v, and A are functions of x. For multivariate distributions (æ) is to be replaced by the set of variates 
(415 Lar <<<, Ta): 


Since | f(x, а;) dz = 1, for all a;, we have 


few (2 ula) vele) + 4c) da = exp {—B(a,)}. (4) 


Now the u,(&;) are p independent functions of the p parameters о. We can express the a, inversely as 
functions of the u,’s. Then B(x;) can be expressed in terms of the up's as 


B(a) = b(u,). (5) 
Then (4) becomes M М 


[е |2 u(y) 0.0) + a) dx = exp {—b(u,)}. (6) 


3. EXACT FORM or I, 


f(x, ау) = exp (È urlas) (ж) + A(x) + Ba) Е, (7) 


апа f(x, aj) = exp |$ us (05) vale) + A(x) + ва) " (8) 


Miscellanea 535 


so that f (x,a;) and f(z, 2j) have the same mathematical form but different sets of values of the parameters. 
We have 


ns f (Ufer, anl- Ulea de 


=2-2 f Ute, a) f(a.) de 


-2-2 f exp [8 чиа) + bua) vir) + A(x) + 4804) +4882] dz 
= 3- exp (}B(a,) + MB) Í exp [2 (usan) + dealer} en) + 4] de. (9) 
Writing фиа) + jus (aj) for ида) in (6), we have 
{ ехр [5 (hula) + ааа) оиа) + 4| de = exp[—b(dus(a,) +4044). (10) 


From (9) and (10) we have 
I, = 2—2exp [}B(a,) + 3B(aj) — м (0) + 44). (11) 


The curious point to be noted is that the function A(x) remains unaltered in the integral on the left- 
hand side of (10) which enables us to evaluate that integral explicitly in virtue of (6). 


3-1. Illustrative example 
Consider the Type III distribution 


f(z,a,p) = wwe (0x « oc). (12) 
We write f (a.a, p) = exp{—ax+plogx—logx+ ploga— log Г(р)). (13) 
Here u, = а, из = p, B = ploga—log Г(р). Expressing B in terms of u, and tig, 
В = u, log u, —logT(u,) = b(u,, us), 
ata’ р+р\ (ptr ata _ ptp^ 
М aa ig ) = ( 2 Je" (257 ). 
+ + 
I, = 2—2exp [lega ion Г(р) + 4p’ loga’— flog T(p’) — 65) log e) +1од r | 
ptp 
=2 ar( 2 ) atar’ (м) 
Т quaro Е ug 
2 
4. THE EXACT FORM OF J 
= fue. a5) f), 2) flog f(x, a5) —log f(x, aږ)}‎ da: 
= $ (us (05) — (05) (foe aiaz- «ча, 2)4) 
k= 
(15) 


(m led) uro Belo) 
where E’v,() and Ev,(a) denote the expectations of yr) when the parameters are aj and о; respet- 
tively. For brevity E^ 1 (Age tah (Oe Ж, 
Then J= E tu-ma a (16) 
35 Biomet. 42 


586 Miscellanea 
E; and E, can be obtained as follows. We have 


as lonia) = È ди ب(‎ 


k=1 ёа, ôa, 
Since a E an) =0, 
ди, eB 
we have | PLE =0. (17) 


Setting r = 1,2, ...,p in (17), we have p simultaneous linear equations to determine the Ey. Then E; are 
obtained by writing aj for æ; іп E,. Thus (15) and (17) enable us to express J explicitly in terms of the 
parameters. 


The above formulae are greatly simplified if we take the u, as parameters and express B(«,) in terms 
of them. Then (17) becomes simply 


£,+—=0, sothat Е, = 3 
From (16), ge & oh-u (- m E 28 (18) 
Writing wu; — u, = du;, the differential form of J is 
дае SaL, (19) 


к=11=1 Uru, 


5. THE EXACT FORM OF I,, (m EVEN) 
For brevity write 
Ја) =f, fiza) =f’, шо) = uk, B(w)- В, etc. 


Now Ls = firmo imas 
= | (fn —film)™ de (since m is even) 


(фу 


= Ў (-1y W i (20) 
r=0 Y: 
where À, = | fri frim qa; ү 


ЕНЕ 


Тш +— s) vle) + A (ac) Lis) Н! +— Za) da: 


= exp (“= E ЛЕЕ PNE AVE (21) 
NEN, I у. 
Writing Um ath for из in (6), wo have 
fool (ates) arc Yara) m 


From (21) and (22), г 


А. = exp ("aye B- (e 


2)) 


Miscellanea l 537 


m-r 


B+" B-W( - «eru. (23) 


m-r 
т 


Then from (20), Ја = E (-1y(") exp 
т=0 r 


Curiously again, the function A(z) remains unaltered in the integral in (22), which enables us to 
evaluate that integral explicitly in virtue of (6). 


REFERENCES 


JEFFREYS, Н. (1946). An invariant form for the prior probability in estimation problems. Proc. Roy. 
Soc. A, 186, 453. 

JEFFREYS, Н. (1948). Theory of Probability, 2nd ed. Oxford: Clarendon Press. 

Koopman, B. О. (1936). On distributions admitting a sufficient statistic. Trans. Amer. Math. Soc. 
39, 399. 

MAHALANOBIS, Р. C. (1936). On the generalized distance in statistics. Proc. Nat. Inst. Sci. Ind. 12, 49. 


35-2 


'[ 538 ] 


REVIEWS 


Demand Analysis. A Study in Econometrics. By HERMAN WOLD, in association with 
Lars Jurgen. New York: Wiley and Sons (Stockholm: Almqvist and Wiksell). 
Pp.1953. xvi+358. 56s. 


Originating in a study of consumer demand for food in Sweden undertaken in 1938 by Prof. Wold and 
Mr Jureen, this book sets out to present a self-contained account of both methods and results. It 
includes a report on the empirical findings, based on family budget surveys and market statistics, a 
section giving a survey of methods in non-technical language, and sections dealing in more concentrated 
and technical manner with some aspects of economic theory, stochastic processes and regression 
analysis. 

The theoretical sections are of considerable incidental interest for econometrics generally, although 
the developments, on an abstract level, are occasionally only remotely relevant to the reported 
empirical study. 

The author states that one of his principal aims was to justify the use of the traditional method of 
least-squares regression, via the use of economic models of so-called recursive type. His main argument 
rests on the notion of causality in consumer demand situations. With this as a guide to the method 
of application, he obtains, among others, the result that the regression method for the estimation of 
recursive systems is unbiased and consistent on the assumption of non-correlation between the 
explanatory variables and the disturbances, irrespective of whether there is inter-correlation between 
these disturbances. A generalization of the classical formula is derived for the standard error of a 
regression coefficient which allows for intercorrelation. 

The wide scope of recursive models is indicated by a proof that any set of time series has a formal 
representation in terms of a recursive system. 

However the causality argument is regarded (and it is certainly not a straightforward one), the 
regression method would still appear on Wold’s showing to compare favourably, in demand analysis 
at least, with other methods which have been proposed. 

The book is very fully annotated; there is an extensive bibliography (although several of the dated 
references in the text are omitted), and important sets of exercises accompany the theoretical parts. 
Based on years of research into the foundations of the subject, this is a very thorough work of scholarship. 


Е. б. FOSTER 


A Study in the Analysis of Stationary Time-series. 2nd edition. By НЕвмАх WOLD, 


with an Appendix by PETER WHITTLE. Stockholm: Almqvist and Wiksell. 1954. 
Pp. viii+236. 42s. 


As far as the main text is concerned, the present edition differs little from the first, published in 1938, 
the only change of major importance being the replacement of appendices A and B in the first edition 
by two new ones. 

In the first appendix, Prof. Wold makes short but useful comments on certain parts of the text, 
mainly in the light of recent work. Appendix 2, comprising 32 pages, is written by Dr P. Whittle and 
is devoted to a survey of recent advances in the subject. i 

The first three chapters of the book dealing with Prof. Wold’s fundamental contributions to the 
theory of discrete stationary schemes are by now familiar to most statisticians and need no further 
comment. Chapter Iv, on the application of stationary schemes to experimental data, has not been 
as successful in withstanding the passage of time, and the discussion tends to be outmoded in view of 
the advances made in the testing of specific hypotheses and in the problem of estimation. 

Chapter 1 of Appendix 2 is devoted to the contributions of Cramér, Kolmogoroff and Wiener to 
spectral theory and Chapter 2 to a synopsis of Whittle’s own work in the theory of inference, together 
with some miscellaneous results in distribution theory and periodogram analysis. To attempt even а. 
cursory review of the field in so short a space is not practicable, and the result is a one-sided exposition 
of the subject. Even so, the treatment is interesting and some new results are introduced. 


Reviews. 539 


The emphasis laid throughout on the mathematical and structural, rather than the statistical aspects 
of time-series analysis, is likely to make the book of greater utility to the mathematical statistician than 
to one who is interested in the applications to economics, physics, meteorology, ete. The latter is also 
likely to view with extreme pessimism the warnings made at intervals throughout the book about 
the reliability of the information provided by the analysis. Briefly stated, the argument is that signifi- 
cance problems are complicated by the fact that one is determining the whole structure of a series from 
a realization or sample and also that the attachment of quantitative significance to the sample func- 
tions is not possible since they are conditioned by the size of the statistical masses to which they refer; 
in particular, there is arbitrariness in (а) the time unit of collection of the data, (b) the region in space 
to which this unit refers. A great deal has been done to remove the first objection by realizing the 
futility of testing individual serial correlations, and much progress has been made in discriminating 
between various hypotheses. The evidence up to the present seems to indicate that the problem is 
soluble in terms of the classical theory. 

The second objection is more troublesome but is not во serious as one might be led to expect. As far 
as (a) is concerned, the choice of time unit raises no special difficulty, since in many instances the auto- 
correlation properties of time-series are not in themselves of fundamental importance. Thus whilst 
collecting data at quarterly intervals instead of annually is likely to alter the structural form of the 
series, if it were required toestimate the correlation between two time-series, the ultimate aim would be 
to eliminate the effect of autocorrelation during the course of the analysis. The intrinsic autocorrelative 
properties of a series are likely to be more important in the problem of prediction, but recent work 
seems to suggest that as far as practical applications are concerned much remains to be done in this 
field. 

The difficulties raised by (b) are more serious, but this type of objection is latent in, and tends to 
restrict the conclusions which are drawn in many other problems in statistics. 

The need for further research is obvious, and Prof. Wold’s own work on this problem of quantitative 
significance is sufficient to raise the question that perhaps our ptions in dealing with observed 
series are too stringent. - G. M. JENKINS 


An Introduction to Stochastic Processes with special reference to Methods and 
Applications. By M. S. BARTLETT. Cambridge University Press. 1955. Pp. 
xiv -- 312. 35s. 


Probability problems concerned with the description of changes with time—where we use ‘time’ in the 
operational sense—are asold as the theory itself. The entire development of the subject in the sixteenth 
and seventeenth centuries was due to the zeal and passion of gamblers and, among other queries, the 
probability of ruin was one which ranked importantly. Interest in this type of problem never faded out, 
and we find it discussed by one after another of the great probabilists right up to the present day. The 
difference between the modernsand their predecessors is that problems involving thedynamics of change 
are now given a special title—stochastie processes—and are subject to a unified systematic attack. 

This systematic attack, which has its origins chiefly in the research of the Russian school of pro- 
bability and in particular in that of Kolmogoroff, has resulted in a great deal of elegant mathematics 
and a certain amount of useful statistical methods. Its great defect has been that the problem of 
stochastic processes has been lifted beyond the understanding of many statisticians who still continue 
to solve the problems in ad hoc fashion as they arise. Р 

Tn 1946 М. 8. Bartlett attempted to remedy this state of affairs by publishing in mimeographed form 
the lectures which he had given in the University of North Carolina. Further clarification, in the dis- 
crete case, came with W. Feller's book Introduction to Probability Theory, and now we again have Prof. 
Bartlett who attempts to avoid too much mathematical abstraction and gives a general survey of 
techniques and applications. Topies on pure theory covered by him are random sequences, processes in 
continuous time, limiting stochastic operations and stationary processes. The problem of the random 
walk, Markoff chains, renewal processes, stochastic convergence and spectral analysis are included. 
Under statistical applications we have a chapter on applications of the random walk, queues, population 
growth and epidemic models, a chapter on statistical inference in stochastic processes and a chapter on 
the correlation analysis of time series including harmonic analysis. 1 ! | 

This book is the best which has yet been produced on this specialized topic. It is lucidly written. by 
someone who clearly sees the statistical implications of the abstract theory and may be read without 
undue difficulty by any student who has\two years of university mathematical training behind rd 
It will undoubtedly be found indispensable by anyone wishing to learn something of а subject whi 
has engaged the attention of nearly all probabilists in the past decade. F. N. DAVID 


“ 


540 Reviews 


Probability Theory. By M. Loève. New York: D. van Nostrand Company, Inc. 1955, 
Pp. xv +501. $12.00. 


This is a book on probability theory written by a mathematician who has himself made distinguished 
contributions to the subject during the past twenty years. Prof. Loéve divides his book into five parts. 
In the introductory part we have what might be called the intuitive approach to probability which is 
very often all that the statistician needs. In spite of the fact that the topics are not new they аге set 
out in clear and pleasing fashion, and the corollaries to the various theorems proved, particularly in 
the laws of large numbers, will be fresh to those who are not thoroughly familiar with the French and 
Russian literature. The notion of chain dependence is also introduced at this stage. All this is intro- 
ductory. 

In Part One proper Prof. Loéve sets us sternly to work with 83 pages on ‘Notions of Measure 
Theory’. Here we have additive set functions, topological spaces, measurable functions and Lesbesgue 
integration. The two chapters which compose this part can be read fairly easily by a student who has 
taken a university degree course in mathematics, but the reasoning is rather condensed and may cause 
difficulty to students coming to it for the first time. In Part IT, entitled ‘General Concepts and Tools 
of Probability Theory’, we have the development of probability laws, characteristic functions and 
distribution functions, much of which is in essence familiar although not perhaps the rigorous pre- 
sentation. 

With ‘Independence’, which is the topic of Part Three, Prof. Loéve gives us a characteristically 
thorough thrashing out of convergence in probability and central limit theorems, Little of this has 
appeared explicitly in English, and it is useful to have it all down in a moderately compact form 
(107 pages). From ‘Independence’ it is natural that we should turn to ‘Dependence’, the topic of Part 
Four. Here, apart from the clear and mathematically rigorous presentation, there is little which has 
not been covered by Doob and Bartlett, except perhaps the second-order random functions. 

This book will be excellent for those with a mathematical training who wish to specialize in pro- 
bability theory regarded as a mathematical discipline. It is throughout logically inevitable in its 
presentation and written as rigorously as only a French-trained mathematician can. There can be no 
better beginning for a young mathematical student than to be given this book. It will, however, not 
be helpful in the training of the statistics student. Many a promising statistician has been wrecked on 
the desert island of mathematical rigour, and those trained in modern mathematical techniques often 
find the necessarily heuristic approach to numerical data painful. However, for the class of student 


indicated the book is excellent and all providing. It is unfortunate that it has to be recorded that the 
price is $12. = 3 
F. N. DAVID 


Statistical Inference. By Herren M. WALKER and Joseren Lev. New York: Henry 
Holt and Company. 1953. Pp. хї+ 510, 48s. 


This text-book on statistical method is designed for beginners with little or no mathematical back- ` 
ground. A good many mathematical formulae are, however, used (albeit always accompanied by 
careful explanation and practice exercises), and a student who has mastered such a text as Prof. 
Walker's previous book, Mathematics Essential for Elementary Statistics, would find the going easier. 
Tn choice of subject-matter and general emphasis it is definitely ‘applied’, the application being to 
educational research ; this is not, however, revealed by the title. 

Tn scope it is fairly comprehensive. It begins with the general idea of statistical inference and an 
intuitive discussion of hypothesis testing. The concept of probability is introduced, then the binomial 
distribution, with a discussion of the closeness of the normal approximation. By use of a population of 
two classes only, a variety of problems of hypothesis testing and estimation are treated, the account 
including such topics as one- and two-sided tests, critical regions, errors of the first and second types, 
power functions, confidence limits—concepts which gain in clarity for the beginner by definition in this 
simple context. Populations with more than two classes are next studied, and then some of the more 
important concepts regarding distributions on a continuous variable. There follows an account of а 
class project in sampling which is designed to familiarize the reader with the statistics possessing one 
of the four continuous distributions of major importance: the Normal, Chi-square, Student's and F dis- 
tributions. The method here is to plot the empirically derived distribution, superimpose the mathe- 
matical curve, and then introduce the student to the table of this function in place of its mathematical 
formula. With this basic material assembled, the subjects now treated include: inferences concerning 


Reviews 541 


the mean or difference between two means; inferences concerning variances and standard deviations; 
analysis of variance; linear regression and correlation; biserial, point biserial, phi, tetrachoric and rank- 
order correlation coefficients and the correlation ratio; the effects of measurement errors and the 
reliabilij& coefficient; multiple regression and correlation; analysis of variance with two or more 
variables of classification ; analysis of covariance; percentiles; transformation of scales; non-parametric 
methods. The chapter on non-parametric methods was written by Prof. Lincoln Moses. 

The treatment is concrete, and as far as possible basic ideas are explained by means of numerical 
examples. The student is drilled in computing techniques; a rather extensive set of twenty-three tables 
is an integral part of the text for this purpose, detailed instructions in their use being given. (The 
reader may on occasion be somewhat startled at being addressed directly from the text, exhorted to do 
such and such a computation.) А weleome feature of the book for educationalists should be the amount 
of illustrative numerical data actually taken from published work on educational research. 

The book is well referenced, and a list of authors for further reading is placed at the ends of chapters. 
Many exercises are set, with answers given at the back of the book. There is a subject index, an author 
index and a glossary of the mathematical symbols used. 

It is to be expected of a text with which Prof. Walker is associated that the exposition should be 
pedagogically sound, and there is evidence that a great deal of care was taken about the order and 
method of presentation of the material. The result cannot be said to be completely satisfactory. It is, 
for example, open to serious doubt on occasion whether the average student will be able to absorb 
simultaneously details of computing techniques and the basic principle of the statistical method under 
explanation. Occasionally, too, the style becomes so woolly as to present quite inaccurate information, 
Thus, on p. 14, two observations are defined to be independent ‘when information about one of them 
provides no clue whatever as to the other’. A reader may well be puzzled by this if he considers that 
in the case of two independent observations from the same (unknown) population, the first observation 
provides an unbiased estimate of the mean of the second. 

Despite some defects, however, research students of education and allied fields should find this text 
invaluable, not only as an instructional manual, but also as a general work of reference in statistical 
method. F. б. FOSTER 


Outline of Biometry and Statistics.. By С. I. Buss and D. W. Carnoux. New Haven, 
Conn.: Yale Co-operative Corporation. 1954. Pp. 272--xvi. $4.50. 


This book will be extremely useful to those who teach biometry, and to statisticians who work in a 
biological field or who wish to have biological examples to illustrate their teaching of statistics. Each 
chapter consists of a number of very concise explanatory notes, together with several illustrative 
worked examples and exercises chosen from a wide range of experimental material. : , 

The subjects covered include the binomial, negative binomial and Poisson distributions E chi-squared 
and the analysis of contingency tables; the normal distribution with the estimation of its mean and 
variance; the analysis of variance, factorial experiments and simple experimental designs; regression, 

© correlation and association; bio-assay. The authors regret that the proposed length of course (about 
- 900 hours of students’ time) necessarily excludes a few other topics such as covariance, probits, dis- 
criminants, and sequential and other sampling methods. i f 

The professional statistician will find this book packed with information, such as short-cut formulas, 
precise statements of the limits within which approximate tests can be relied upon, and alternative 
parametric and non-parametric tests. A number of useful tables and charts are included. A long list 
of references enables the reader to consult the original papers whenever necessary, except for work yet 
unpublished. The exposition is almost always extremely lucid, and the reviewer has noticed few errors 
(apart from those already covered by a list of errata). But surely the exact test for 2 х 2 tables was 
independently discovered by Fisher and by Irwin (see Metron, 12, 1935, 1)? 

The book is too condensed for the unassisted student, but one working through the course under a 
teacher’s guidance will find it very useful. CEDRIO A. B. SMITH 


542 Reviews 


Statistics and Mathematics in Biology. Edited by О. KEMPTHORNE, T. A. BANCROFT, 
J. W. Gowen and J. L. LusH. Ames, Iowa: Iowa State College Press. 1954. Pp. 
ix+632. $6.75. 


In case the title should mislead, it should be made clear at once that this is not a text-book on the use 
of statistical and mathematical methods in biology. It is a collection of forty-four essays by various 
authors (many of great distinction) on a wide range of topics in which biology and statistics meet. In 
many cases these deal with recent research by the author; some are highly theoretical, others concern 
practical problems involving only simple mathematics. Almost all are lucidly expressed, and can be 
read with interest and profit. The subjects discussed include causation, regression, path coefficients and 
multivariate analysis, classification problems, experimental designs, competition between species, 
growth curves, sampling of populations, toxicity tests, bio-assay, taste testing, feeding experiments, 
animal behaviour, gene frequency estimation, the effect of radiation on cells and viruses, genetic and 
breeding problems and the estimation of nucleoproteins in cell division. 

Naturally this is not the sort of book to which one can turn for complete and detailed information 
on any one topic, although the very full bibliography may indicate the source of such information. 
But it is one which it would be useful to have about the laboratory (whether statistical or biological), 
since somewhere it may suggest the appropriate line of approach to a particular problem, or indicate а 
profitable new line of research. 

One minor point may be mentioned. K. R. Nair has pointed out (Bull. Calcutta, Statist. Ass. 20, 
1954, 18) that Quenouille's *almost balanced incomplete block designs' are particular cases of the 
Bose-Nair-Rao ‘partially balanced incomplete blocks’ (Bose & Nair, Sankhya, 4, 1939, 337). 


CEDRIC A. В. SMITH 


Statistical Analysis in Chemistry and the Chemical Industry. By C. A. BENNETT 
and N. L. FRANKLIN. New York: John Wiley and Sons, Inc. 1954. Pp. xvi + 724. 588, 


This book @ontains a well-written account of statistical techniques, some of which are not readily 
available elsewhere. After an introductory chapter, general notions of frequency distributions, 
measures of location, spread and association are introduced. There follows a discussion of probability 
theory including some account of moment generating functions, cumulants and Ё statistics and the 
common sampling distributions derived from the normal law. A short account of the properties of 
gamma and beta functions is followed by a chapter on confidence limits and tests of significance, in- 
cluding some non-parametric methods and a somewhat brief discussion of sequential tests. In 
Chapter 6 there is an account of linear regression, curvilinear regression and discriminant functions 
and of methods for the solution of the linear equations which arise in the application of these techniques. 

Chapter 7, which contains over 150 pages, is concerned with the analysis of variance, and this is 
followed by a discussion of almost equal length on the design of experiments. The book ends with 
chapters on the analysis of counted data, quality control and tests for randomness. 

To those workers in the chemical industry and elsewhere who have been introduced to statistical 
methods via a ‘cookery book’ and who now wish to broaden their knowledge and to learn something 
more of the basis of these methods, this book is recommended. It will be less useful to those who wish 
to know how statistics should be used in practice. Some of the examples fail to demonstrate the value 
of the method discussed, or the sort of circumstances in which it might be used, and instances occur 
where the statistical analysis confuses rather than clarifies the situation. This is particularly true of the 
example on p. 395, where the situation is much more readily appreciated from a set of four simple 
graphs than from the rather complicated analysis of variance which is given in this book. 

It is very important that the experimenter should be the master and not the slave of statistical 
methods. In particular, he must be very clear about what it is he really wants to know and should not 
allow himself to be conditioned into using statistical ideas and concepts which are not really appropriate 
tor his investigation. In particular, he should not allow himself to be overawed by apparently com- 
plicated mathematical machinery. When he feels his problem is not really answered by the application 
of standard techniques he should try to deal with it from first principles. His solution even if it is 
somewhat approximate and unorthodox will usually serve him better than the misapplication of a 
mathematically ‘exact’ technique. In helping to instil thes? first principles this text-book is valuable; 
the reader should, however, not take too seriously some of the applications which are described. 


G. Е. Р. BOX 


Reviews 543 


A Million Random Digits with 100,000 Normal Deviates. The Кахр CORPORATION. 
Illinois, U.S.A.: The Free Press. 1955. Pp. хху + 600. $10.00. 


Since L. H. C. Tippett's set of 40,000 random numbers was first published in 1927 as Tracts for Com- 
puters, No. xv, there has been a steady demand for random numbers from a wide variety of users. This 
was recognized by the publication of M. G. Kendall's and B. Babington Smith's set of 100,000 digits in 
Tracts for Computers, No. xx1v, just prior to the war. Since then the demand has grown rather than 
diminished, and the recent emphasis on Monte Carlo methods has led to the need for an even larger 
supply. This need has now been filled by the publication of this volume of 1,000,000 random digits 
produced with the aid of an electronic roulette wheel by the Rand Corporation. Tests for randomness, 
such as the frequency test, the so-called poker test and the pairs test, have been applied to the numbers 
and give satisfactory results. $ 
Half the table has been used to obtain 100,000 random normal deviates by the conversion of five- 
digit random numbers with a table of the cumulative normal distribution function. The deviates 
obtained are tabulated to three decimal places. P. G. MOORE 


Tablitsi dlya vichisleniya nepolnoi T-funktsii veroyatnosti 3? (Tables for the 
calculation of the incomplete T-function and the -probability function). 
By E. E. Stvrskn (Ed. А. N. Когмовокоу). Moscow and Leningrad: Izdatelstvo 


Akademii Nauk SSSR. 1950. Рр. 14--55 pp. tables. 
These tables are intended to facilitate the computation of 


P(x’, n)= are da, 


1 © 
АКЫ х 
and the related incomplete gamma-function. There is a table of P(x*,n) for 

yt: 0(0-1)32 and у: 3-2 (0-2) 7-0 (0-5) 10-0 (1-0) 35-0, 

п: 0 (0-05) 0-2 (0-1) 6-0 n: 0 (0-1) 0-4 (0-2) 6-0. 
There are also tables of three auxiliary functions designed for special purposes. To facilitate inter- 
polation for small values of x? the function 


T(x?, п) = (2) — Р(Х, n)) 


X2: 0 (0-05) 0-2 (0-1) 1:0, 
m: 0 (0-05) 0-2 (0-1) 6-0. 


For larger values of n the function P(x?, n) is tabled, with argument 
t=,/(2x*) -A(2n) 


t: —4-0 (0-1) 48, 
n: 6-0 (0-5) 11-0 (1:0) 32-0. 
- A further table gives values of the function z(t, z)=P(x?, n), where t is as above and x=,(2/n). This 
funetion is tabled for = — 4:5 (0-1) 4:8, 
ж= 0 (0-02) 0-22 (0-01) 0-25. 
Tt is claimed that this small (seven page) table effectively provides for the computation of the in- 
complete gamma-function in the region (n> 102) not covered by the existing tables of K. Pearson. 
A final table contains coefficients of Everett’s and Newton's interpolation formulae. Second and 
fourth central differences with respect to y? (or 4), and second central differences with respect to n (or x) 
rinted in the tables. : s y > , 
"Tho tables are clearly reproduced and there is an introduction setting out their genesis, calculation 
and use, with a number of worked examples. N. L. JOHNSON 


is tabled for 


in place of x*, for 


“ 


544 Reviews 


OTHER BOOKS RECEIVED 


. Rank Correlation Methods, 2nd edition. By M. С. KENDALL. London: Charles j 


Griffin and Co. 1955. Pp. 196. 36s. 


: Manpower Shortage and the Fall of the Roman Empire. By A. E. В. Волк. 
U.S.A.: University of Michigan Press. 1955. Рр. 169. $4.50. 


. The Foundations of Statistics. By L. J. Savage. U.S.A.: John Wiley and Sons; 
London: Chapman and Hall. 1954. "Pb ху +294. 485. 


. Statistics in Research. Ву B. Озтгк. U.S.A.: Iowa State соме 1954. Рр. 
хіу +487. $6.95. 


. Annual Epidemiological and Vital Statistics for 1952. Switzerland: World Health 
Organization, Geneva. 1955. Pp. x + 533. 50s. ө l 


Publications of the U.S. Department of Commerce, National Bureau of Standards 


. Tables of Sine and Cosine Integrals for arguments from 10 i 100. Applied 
Mathematics Series, 32. 1954. Pp. xv+187. $2.25. 


. Tables of the Gamma Function for Complex Argument. applied Mathematics 
Series, 34, 1954. Pp. xvi+ 105. $2.00. 


. Tables of Functions and of Zeros of Functions. Applied uus Series, 
37. 1954. Pp. іх+211. $2.25. 


‚ Contributions to the Solution of Systems of Linear Equations and the Deter- 
mination of Eigen-values. Edited by Orca TavsskEv. Applied Mathematics 
Series, 39. 1954. Pp. 139. $2.00. 


10. Tables of the Error Function and its derivatives. Applied Mathematics Series; 


41. 1954. Pp. xi+302. $3.25. 


Pom 


ty: ЖОЛА, Жм... 


